Course 3: Quantitative Finance & Portfolio Theory
Welcome to the Quantitative Finance course!
In this course, you'll master the mathematical foundations that power institutional trading systems, from statistical analysis to portfolio optimization and risk management.
| Attribute | Value |
|---|---|
| Modules | 18 |
| Duration | ~45 hours |
| Exercises | 108 |
| Prerequisites | Course 0 (Python for Finance) |
What You'll Build
By the end of this course, you'll have built:
- Statistical Analysis Tools - Analyze financial returns and fit distributions
- Portfolio Optimizer - Construct efficient portfolios using mean-variance optimization
- Risk Management System - Calculate VaR, CVaR, and stress test portfolios
- Monte Carlo Simulator - Simulate thousands of portfolio paths
- Performance Dashboard - Interactive analytics and attribution
- Production System - Deployed, monitored quantitative infrastructure
The Capstone Project integrates all these into a complete portfolio management system.
Course Structure
Part 1: Statistical Foundations (Modules 1-3)
└── Statistics, Returns, Time Series
│
▼
Part 2: Portfolio Theory (Modules 4-6)
└── Basics, Optimization, Advanced Techniques
│
▼
Part 3: Risk Modeling (Modules 7-9)
└── VaR, Beyond VaR, Factor Models
│
▼
Part 4: Simulation & Analytics (Modules 10-12)
└── Monte Carlo, Attribution, Dashboards
│
▼
Part 5: Production & Infrastructure (Modules 13-18)
└── Reporting, Execution, Microstructure, HFT, Cloud, Operations
│
▼
CAPSTONE PROJECT
Module Overview
Part 1: Statistical Foundations
| Module | Title | Key Concepts |
|---|---|---|
| 1 | Statistics for Finance | Descriptive stats, probability distributions, hypothesis testing, correlation |
| 2 | Return Analysis | Simple vs log returns, annualization, Sharpe ratio, drawdowns |
| 3 | Time Series Analysis | Stationarity, autocorrelation, ARIMA, volatility clustering |
Part 2: Portfolio Theory
| Module | Title | Key Concepts |
|---|---|---|
| 4 | Portfolio Basics | Risk/return tradeoff, diversification, efficient frontier |
| 5 | Portfolio Optimization | Mean-variance, Maximum Sharpe, Minimum Variance, constraints |
| 6 | Advanced Techniques | Risk parity, Black-Litterman, robust optimization |
Part 3: Risk Modeling
| Module | Title | Key Concepts |
|---|---|---|
| 7 | Value at Risk | Historical VaR, parametric VaR, Monte Carlo VaR |
| 8 | Beyond VaR | CVaR/Expected Shortfall, drawdown analysis, stress testing |
| 9 | Factor Models | CAPM, Fama-French, PCA-based factors, factor attribution |
Part 4: Simulation & Analytics
| Module | Title | Key Concepts |
|---|---|---|
| 10 | Monte Carlo Simulation | GBM, correlated assets, option pricing, portfolio simulation |
| 11 | Performance Attribution | Brinson attribution, factor attribution, contribution analysis |
| 12 | Building Dashboards | Plotly, real-time metrics, interactive visualizations |
Part 5: Production & Infrastructure
| Module | Title | Key Concepts |
|---|---|---|
| 13 | Professional Reporting | Automated reports, PDF generation, scheduling |
| 14 | Rebalancing & Execution | Calendar/threshold rebalancing, transaction costs, tax-loss harvesting |
| 15 | Market Microstructure | Order books, bid-ask spread, price impact, optimal execution |
| 16 | High-Frequency Concepts | Latency, co-location, HFT strategies, regulations |
| 17 | Cloud Deployment | AWS/GCP, Docker, serverless, CI/CD |
| 18 | 24/7 Operation | Monitoring, alerting, incident response, backup/recovery |
Prerequisites Check
Before starting, ensure you can run the following code without errors:
# Prerequisites Check
import sys
print(f"Python version: {sys.version}")
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Finance libraries
import yfinance as yf
from scipy import stats, optimize
print("\nAll prerequisites installed!")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
# Quick test: Download sample data
print("Testing data download...")
data = yf.download('SPY', period='1mo', progress=False)
print(f"Downloaded {len(data)} days of SPY data")
print("\nYou're ready to start Course 3!")
How to Use This Course
Learning Approach
- Read the concepts - Understand the theory before coding
- Run the examples - Execute all code cells to see results
- Do the exercises - Practice with guided and open-ended problems
- Check solutions - Compare your approach after attempting
- Build the project - Apply everything in the module project
Exercise Format
Each module has 6 exercises: - 3 Guided exercises - Fill in the blanks with hints provided - 3 Open-ended exercises - Build complete solutions from scratch
Solutions are provided in collapsible sections - try first before peeking!
Time Commitment
- Each module: ~2.5 hours
- Recommended pace: 1-2 modules per session
- Total course: ~45 hours
Let's Begin!
Start with Module 1: Statistics for Finance — good luck on your quantitative finance journey!
Module 1: Statistics for Finance
Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations
Learning Objectives
By the end of this module, you will be able to:
- Calculate and interpret descriptive statistics for financial returns
- Apply probability distributions to model asset prices and returns
- Conduct hypothesis tests to validate trading strategies
- Compute and analyze correlation and covariance matrices
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Course 0 (Python for Finance) |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Financial Data
# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'
print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
returns = prices.pct_change().dropna()
print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()
Section 1.1: Descriptive Statistics
Before diving into complex models, we need to understand our data. Descriptive statistics give us a snapshot of what we're working with.
In this section, you will learn: - How to measure the "center" of returns (mean, median) - How to measure the "spread" of returns (variance, std deviation) - How to measure the "shape" of returns (skewness, kurtosis)
1.1.1 Central Tendency: Mean vs Median
The mean is the average - simple to calculate but sensitive to outliers.
The median is the middle value - robust to outliers.
Why does this matter in finance? - A few extreme days (like March 2020) can significantly skew the mean - The median tells you what a "typical" day looks like
# Let's look at SPY returns
spy_returns = returns['SPY']
# Calculate mean and median
mean_ret = spy_returns.mean()
median_ret = spy_returns.median()
print('=== SPY Daily Returns ===')
print(f'Mean: {mean_ret:.6f} (annualized: {mean_ret*252:.2%})')
print(f'Median: {median_ret:.6f}')
print(f'\nDifference: {(mean_ret - median_ret):.6f}')
1.1.2 Dispersion: Variance & Volatility
Returns fluctuate. We measure this with:
- Variance (σ²): Average squared deviation from mean
- Standard Deviation (σ): Square root of variance
- Volatility: Annualized std = σ × √252
The key insight: Volatility is often used as a proxy for risk.
# Calculate volatility for all assets
print('=== Annualized Volatility ===')
print('(Higher = more risky)\n')
for ticker in tickers:
daily_std = returns[ticker].std()
annual_vol = daily_std * np.sqrt(252)
print(f'{ticker}: {annual_vol:.1%}')
1.1.3 Shape: Skewness & Kurtosis
Beyond center and spread, the shape of returns matters enormously.
Skewness measures asymmetry: - Negative skew = More extreme losses than gains (bad for investors!) - Positive skew = More extreme gains than losses - Zero skew = Symmetric distribution
Kurtosis measures "tail thickness": - High kurtosis (>0) = Fat tails = More extreme events than expected - Zero kurtosis = Normal distribution tails - Negative kurtosis = Thin tails
# Calculate skewness and kurtosis
print('=== Return Distribution Shape ===')
print(f'{"Asset":<6} {"Skewness":>10} {"Kurtosis":>10}')
print('-' * 28)
for ticker in tickers:
skew = returns[ticker].skew()
kurt = returns[ticker].kurtosis()
print(f'{ticker:<6} {skew:>10.2f} {kurt:>10.2f}')
Visualizing Return Distributions
# Visualize SPY return distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Histogram with normal overlay
ax1 = axes[0]
ax1.hist(spy_returns, bins=50, density=True, alpha=0.7,
color='steelblue', edgecolor='white', label='Actual SPY')
x = np.linspace(spy_returns.min(), spy_returns.max(), 100)
ax1.plot(x, stats.norm.pdf(x, mean_ret, spy_returns.std()),
'r-', lw=2, label='Normal Distribution')
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax1.set_xlabel('Daily Return')
ax1.set_ylabel('Density')
ax1.set_title('SPY Returns vs Normal Distribution')
ax1.legend()
# QQ-plot
ax2 = axes[1]
stats.probplot(spy_returns, dist='norm', plot=ax2)
ax2.set_title('Q-Q Plot: Are Returns Normal?')
plt.tight_layout()
plt.show()
Exercise 1.1: Calculate Descriptive Statistics (Guided)
Your Task: Complete the function to calculate key statistics for a return series.
Fill in the blanks to calculate mean, volatility, skewness, and kurtosis:
Click to reveal solution
def calculate_return_stats(returns_series: pd.Series) -> dict:
"""Calculate descriptive statistics for a return series."""
mean_daily = returns_series.mean()
volatility = returns_series.std() * np.sqrt(252)
skewness = returns_series.skew()
kurtosis = returns_series.kurtosis()
return {
'mean_daily': mean_daily,
'mean_annual': mean_daily * 252,
'volatility': volatility,
'skewness': skewness,
'kurtosis': kurtosis
}
# Test
result = calculate_return_stats(returns['SPY'])
print(f"SPY Annual Return: {result['mean_annual']:.2%}")
print(f"SPY Volatility: {result['volatility']:.2%}")
Section 1.2: Probability Distributions
Now that we understand our data's shape, let's formalize it with probability distributions.
In this section, you will learn: - Why the Normal distribution is useful (and where it fails) - How the Student's t-distribution handles fat tails - How to test if your data follows a specific distribution
1.2.1 The Normal Distribution
Despite its limitations, the Normal distribution is foundational:
- Parameters: mean (μ) and standard deviation (σ)
- Properties: 68-95-99.7 rule
- Use cases: Central Limit Theorem, log returns over long periods
# How often do extreme events occur under a Normal distribution?
print('=== Probability of Extreme Events (Normal Distribution) ===')
print()
spy_std = spy_returns.std()
spy_mean = spy_returns.mean()
for sigma in [2, 3, 4, 5]:
prob = 2 * (1 - stats.norm.cdf(sigma)) # Two-tailed
expected_days = int(1 / prob) if prob > 0 else float('inf')
print(f'{sigma}-sigma event: {prob:.2e} probability')
print(f' Expected once every {expected_days:,} trading days ({expected_days/252:.0f} years)')
print()
# Count actual extreme events in SPY
print('=== Actual vs Expected Extreme Events in SPY ===')
print()
for sigma in [2, 3, 4, 5]:
threshold = sigma * spy_std
extreme_days = spy_returns[abs(spy_returns - spy_mean) > threshold]
count = len(extreme_days)
normal_prob = 2 * (1 - stats.norm.cdf(sigma))
expected_count = normal_prob * len(spy_returns)
print(f'{sigma}-sigma events: Actual={count}, Expected={expected_count:.1f}, Ratio={count/max(expected_count, 0.1):.1f}x')
1.2.2 The Student's t-Distribution
The t-distribution is like the Normal but with fatter tails.
- Key parameter: degrees of freedom (df or ν)
- Lower df = fatter tails
- As df → ∞, t-distribution → Normal distribution
- Financial returns typically fit with df = 3 to 8
# Compare Normal vs t-distribution
fig, ax = plt.subplots(figsize=(12, 6))
x = np.linspace(-5, 5, 1000)
ax.plot(x, stats.norm.pdf(x), 'b-', lw=2, label='Normal')
ax.plot(x, stats.t.pdf(x, df=3), 'r-', lw=2, label='t (df=3)')
ax.plot(x, stats.t.pdf(x, df=5), 'g-', lw=2, label='t (df=5)')
ax.plot(x, stats.t.pdf(x, df=10), 'orange', lw=2, label='t (df=10)')
ax.set_xlabel('Standard Deviations from Mean')
ax.set_ylabel('Probability Density')
ax.set_title('Normal vs Student\'s t-Distribution')
ax.legend()
ax.set_xlim(-5, 5)
plt.tight_layout()
plt.show()
print('Notice: Lower degrees of freedom = fatter tails = more extreme events')
# Fit t-distribution to SPY returns
df_fit, loc_fit, scale_fit = stats.t.fit(spy_returns)
print('=== Fitted t-Distribution Parameters ===')
print(f'Degrees of freedom: {df_fit:.2f}')
print(f'Location (mean): {loc_fit:.6f}')
print(f'Scale (std): {scale_fit:.6f}')
print(f'\nInterpretation: df={df_fit:.1f} confirms fat tails in SPY returns')
1.2.3 Testing for Normality
Common tests: - Jarque-Bera test: Based on skewness and kurtosis - Shapiro-Wilk test: Compares data to Normal quantiles
Interpretation: - p-value < 0.05 → Reject normality (data is NOT normal) - p-value ≥ 0.05 → Cannot reject normality
# Test for normality
print('=== Normality Tests ===')
print()
for ticker in tickers:
ret = returns[ticker]
jb_stat, jb_pval = stats.jarque_bera(ret)
is_normal = 'YES' if jb_pval > 0.05 else 'NO'
print(f'{ticker}: Jarque-Bera p-value={jb_pval:.2e}, Normal? {is_normal}')
Exercise 1.2: Fit a t-Distribution (Guided)
Your Task: Complete the function to fit a t-distribution and compare it to the Normal.
Fill in the blanks:
Click to reveal solution
def fit_and_compare_distributions(returns_series: pd.Series) -> dict:
"""Fit Normal and t-distribution, compare the fits."""
norm_mean = returns_series.mean()
norm_std = returns_series.std()
t_df, t_loc, t_scale = stats.t.fit(returns_series)
jb_stat, jb_pval = stats.jarque_bera(returns_series)
return {
'norm_mean': norm_mean,
'norm_std': norm_std,
't_df': t_df,
't_loc': t_loc,
't_scale': t_scale,
'is_normal': jb_pval > 0.05,
'jb_pval': jb_pval
}
# Test
for ticker in tickers:
result = fit_and_compare_distributions(returns[ticker])
print(f"{ticker}: t-dist df={result['t_df']:.2f}, Normal? {result['is_normal']}")
Section 1.3: Hypothesis Testing
How do we know if a trading strategy actually works, or if we just got lucky?
Hypothesis testing helps us distinguish skill from randomness.
In this section, you will learn: - How to formulate null and alternative hypotheses - How to conduct t-tests on financial returns - How to interpret p-values correctly
1.3.1 The Hypothesis Testing Framework
Null Hypothesis (H₀): The default assumption (usually "no effect") - Example: "My strategy's mean return is zero"
Alternative Hypothesis (H₁): What we're trying to prove - Example: "My strategy's mean return is positive"
The p-value: Probability of seeing our result (or more extreme) if H₀ is true - p < 0.05 → Reject H₀ (result is "statistically significant") - p ≥ 0.05 → Cannot reject H₀
# Test if SPY has statistically significant positive returns
print('=== One-Sample t-Test: Is SPY Mean Return Zero? ===')
print()
t_stat, p_value = stats.ttest_1samp(spy_returns, 0)
print(f'Sample mean: {spy_returns.mean():.6f}')
print(f'Sample size: {len(spy_returns)}')
print(f't-statistic: {t_stat:.4f}')
print(f'p-value: {p_value:.4f}')
print()
if p_value < 0.05:
print('Result: REJECT null hypothesis - SPY has significant non-zero returns')
else:
print('Result: CANNOT reject null hypothesis')
1.3.2 Two-Sample t-Test: Comparing Returns
Often we want to compare two assets or two strategies.
Question: Does AAPL outperform SPY?
# Compare AAPL vs SPY
print('=== Two-Sample t-Test: AAPL vs SPY ===')
print()
aapl_ret = returns['AAPL']
spy_ret = returns['SPY']
t_stat, p_value = stats.ttest_ind(aapl_ret, spy_ret)
print(f'AAPL mean: {aapl_ret.mean()*252:.2%} annualized')
print(f'SPY mean: {spy_ret.mean()*252:.2%} annualized')
print(f't-statistic: {t_stat:.4f}')
print(f'p-value: {p_value:.4f}')
print()
if p_value < 0.05:
print('Result: Significant difference between AAPL and SPY')
else:
print('Result: No significant difference')
Exercise 1.3: Hypothesis Test Setup (Guided)
Your Task: Complete the function to perform hypothesis tests on return series.
Fill in the blanks:
Click to reveal solution
def test_returns(returns1: pd.Series, returns2: pd.Series = None,
test_value: float = 0, alpha: float = 0.05) -> dict:
"""Perform one-sample or two-sample t-test on returns."""
if returns2 is None:
t_stat, p_val = stats.ttest_1samp(returns1, test_value)
test_type = 'one-sample'
else:
t_stat, p_val = stats.ttest_ind(returns1, returns2)
test_type = 'two-sample'
is_significant = p_val < alpha
return {
'test_type': test_type,
't_statistic': t_stat,
'p_value': p_val,
'is_significant': is_significant,
'alpha': alpha
}
# Test
result = test_returns(returns['SPY'])
print(f"SPY vs Zero: p={result['p_value']:.4f}, Significant? {result['is_significant']}")
result = test_returns(returns['GLD'], returns['TLT'])
print(f"GLD vs TLT: p={result['p_value']:.4f}, Significant? {result['is_significant']}")
Exercise 1.4: Complete Statistical Analysis (Open-ended)
Your Task:
Build a function that performs a complete statistical analysis of a return series: - Calculate all descriptive statistics (mean, std, skewness, kurtosis) - Fit a t-distribution and report degrees of freedom - Test if mean return is significantly different from zero - Return all results in a dictionary
Your implementation:
Click to reveal solution
def complete_statistical_analysis(returns_series: pd.Series, name: str = 'Asset') -> dict:
"""Perform complete statistical analysis of a return series."""
# Descriptive statistics
desc_stats = {
'mean_daily': returns_series.mean(),
'mean_annual': returns_series.mean() * 252,
'std_daily': returns_series.std(),
'volatility_annual': returns_series.std() * np.sqrt(252),
'skewness': returns_series.skew(),
'kurtosis': returns_series.kurtosis()
}
# Distribution fit
t_df, t_loc, t_scale = stats.t.fit(returns_series)
jb_stat, jb_pval = stats.jarque_bera(returns_series)
dist_stats = {
't_degrees_freedom': t_df,
'is_normal': jb_pval > 0.05,
'normality_pval': jb_pval
}
# Hypothesis test
t_stat, p_val = stats.ttest_1samp(returns_series, 0)
hyp_stats = {
't_statistic': t_stat,
'p_value': p_val,
'significant_returns': p_val < 0.05
}
return {
'name': name,
'n_observations': len(returns_series),
'descriptive': desc_stats,
'distribution': dist_stats,
'hypothesis_test': hyp_stats
}
# Test
analysis = complete_statistical_analysis(returns['MSFT'], 'MSFT')
print(f"=== {analysis['name']} Analysis ===")
print(f"Annual Return: {analysis['descriptive']['mean_annual']:.2%}")
print(f"Volatility: {analysis['descriptive']['volatility_annual']:.2%}")
print(f"t-dist df: {analysis['distribution']['t_degrees_freedom']:.2f}")
print(f"Significant? {analysis['hypothesis_test']['significant_returns']}")
Section 1.4: Correlation & Covariance
Understanding how assets move together is fundamental to portfolio construction.
In this section, you will learn: - The difference between covariance and correlation - How to compute and interpret correlation matrices - Why correlation matters for diversification
1.4.1 Covariance and Correlation
Covariance measures joint variability but is scale-dependent.
Correlation standardizes to -1 to +1:
- ρ = +1: Perfect positive correlation
- ρ = 0: No linear correlation
- ρ = -1: Perfect negative correlation
# Calculate correlation matrix
corr_matrix = returns.corr()
print('=== Correlation Matrix ===')
print(corr_matrix.round(3))
# Visualize correlations
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(corr_matrix, cmap='RdBu_r', vmin=-1, vmax=1)
ax.set_xticks(range(len(tickers)))
ax.set_yticks(range(len(tickers)))
ax.set_xticklabels(tickers)
ax.set_yticklabels(tickers)
for i in range(len(tickers)):
for j in range(len(tickers)):
ax.text(j, i, f'{corr_matrix.iloc[i, j]:.2f}',
ha='center', va='center', color='black', fontsize=12)
ax.set_title('Asset Correlation Matrix', fontsize=14, fontweight='bold')
plt.colorbar(im, label='Correlation')
plt.tight_layout()
plt.show()
1.4.2 Diversification Benefits
When correlation < 1, portfolio risk < weighted average risk.
This is why diversification works!
# Demonstrate diversification
print('=== Diversification Benefit ===')
print()
spy_vol = returns['SPY'].std() * np.sqrt(252)
tlt_vol = returns['TLT'].std() * np.sqrt(252)
correlation = returns['SPY'].corr(returns['TLT'])
print(f'SPY volatility: {spy_vol:.1%}')
print(f'TLT volatility: {tlt_vol:.1%}')
print(f'Correlation: {correlation:.3f}')
print()
# 50/50 portfolio
weighted_avg_vol = 0.5 * spy_vol + 0.5 * tlt_vol
portfolio_returns = 0.5 * returns['SPY'] + 0.5 * returns['TLT']
actual_portfolio_vol = portfolio_returns.std() * np.sqrt(252)
print(f'50/50 Portfolio:')
print(f' If correlation=1: {weighted_avg_vol:.1%}')
print(f' Actual: {actual_portfolio_vol:.1%}')
print(f' Benefit: {weighted_avg_vol - actual_portfolio_vol:.1%} reduction')
1.4.3 Rolling Correlations
Warning: Correlations change over time, especially during market stress.
# Rolling correlations
window = 60
rolling_corr_spy_tlt = returns['SPY'].rolling(window).corr(returns['TLT'])
rolling_corr_spy_gld = returns['SPY'].rolling(window).corr(returns['GLD'])
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(rolling_corr_spy_tlt.index, rolling_corr_spy_tlt, label='SPY-TLT', linewidth=1.5)
ax.plot(rolling_corr_spy_gld.index, rolling_corr_spy_gld, label='SPY-GLD', linewidth=1.5, alpha=0.8)
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('Date')
ax.set_ylabel('60-Day Rolling Correlation')
ax.set_title('Rolling Correlations Over Time')
ax.legend()
ax.set_ylim(-1, 1)
plt.tight_layout()
plt.show()
Exercise 1.5: Build a Correlation Analyzer (Open-ended)
Your Task:
Build a class that analyzes correlations between assets: - Calculate the full correlation matrix - Find the pair with minimum correlation (best for diversification) - Find the pair with maximum correlation (highest risk concentration) - Calculate rolling correlations for a given window
Your implementation:
Click to reveal solution
class CorrelationAnalyzer:
"""Analyze correlations between assets."""
def __init__(self, returns_df: pd.DataFrame):
self.returns = returns_df
self.tickers = returns_df.columns.tolist()
self.corr_matrix = returns_df.corr()
def get_correlation_matrix(self) -> pd.DataFrame:
"""Return the correlation matrix."""
return self.corr_matrix
def find_min_correlation_pair(self) -> tuple:
"""Find the pair with minimum correlation."""
corr_values = self.corr_matrix.values.copy()
np.fill_diagonal(corr_values, 1) # Ignore diagonal
min_idx = np.unravel_index(np.argmin(corr_values), corr_values.shape)
asset1 = self.tickers[min_idx[0]]
asset2 = self.tickers[min_idx[1]]
min_corr = self.corr_matrix.loc[asset1, asset2]
return (asset1, asset2, min_corr)
def find_max_correlation_pair(self) -> tuple:
"""Find the pair with maximum correlation (excluding self)."""
corr_values = self.corr_matrix.values.copy()
np.fill_diagonal(corr_values, -1) # Ignore diagonal
max_idx = np.unravel_index(np.argmax(corr_values), corr_values.shape)
asset1 = self.tickers[max_idx[0]]
asset2 = self.tickers[max_idx[1]]
max_corr = self.corr_matrix.loc[asset1, asset2]
return (asset1, asset2, max_corr)
def rolling_correlation(self, asset1: str, asset2: str, window: int = 60) -> pd.Series:
"""Calculate rolling correlation between two assets."""
return self.returns[asset1].rolling(window).corr(self.returns[asset2])
# Test
analyzer = CorrelationAnalyzer(returns)
min_pair = analyzer.find_min_correlation_pair()
max_pair = analyzer.find_max_correlation_pair()
print(f'Best diversification: {min_pair[0]}-{min_pair[1]} (corr={min_pair[2]:.3f})')
print(f'Highest correlation: {max_pair[0]}-{max_pair[1]} (corr={max_pair[2]:.3f})')
Exercise 1.6: Diversification Calculator (Open-ended)
Your Task:
Build a function that calculates the diversification benefit of combining two assets: - Calculate individual asset volatilities - Calculate the 50/50 portfolio volatility - Calculate the "weighted average" volatility (if correlation=1) - Calculate the diversification benefit (reduction in volatility) - Return the percentage improvement
Your implementation:
Click to reveal solution
def calculate_diversification_benefit(returns_df: pd.DataFrame,
asset1: str,
asset2: str,
weight1: float = 0.5) -> dict:
"""Calculate diversification benefit of combining two assets."""
weight2 = 1 - weight1
# Individual volatilities
vol1 = returns_df[asset1].std() * np.sqrt(252)
vol2 = returns_df[asset2].std() * np.sqrt(252)
# Correlation
correlation = returns_df[asset1].corr(returns_df[asset2])
# Portfolio volatility
port_returns = weight1 * returns_df[asset1] + weight2 * returns_df[asset2]
port_vol = port_returns.std() * np.sqrt(252)
# Weighted average (if correlation = 1)
weighted_avg_vol = weight1 * vol1 + weight2 * vol2
# Diversification benefit
benefit = weighted_avg_vol - port_vol
benefit_pct = benefit / weighted_avg_vol
return {
'asset1': asset1,
'asset2': asset2,
'correlation': correlation,
'portfolio_vol': port_vol,
'weighted_avg_vol': weighted_avg_vol,
'diversification_benefit': benefit,
'benefit_percentage': benefit_pct
}
# Test
pairs = [('SPY', 'TLT'), ('SPY', 'GLD'), ('AAPL', 'MSFT')]
for a1, a2 in pairs:
result = calculate_diversification_benefit(returns, a1, a2)
print(f'{a1}/{a2}: corr={result["correlation"]:.3f}, benefit={result["benefit_percentage"]:.1%}')
Module Project: Statistical Analysis Report
Put together everything you've learned!
Your Challenge:
Create a comprehensive statistical analysis report for QQQ (Nasdaq 100 ETF):
- Descriptive Statistics: Mean, std, skewness, kurtosis
- Distribution Fit: Fit a t-distribution and interpret degrees of freedom
- Hypothesis Test: Test if QQQ has significantly different returns than SPY
- Correlation Analysis: How correlated is QQQ with our other assets?
# Module Project: Your implementation here
Click to reveal solution
# Download QQQ data
print('Downloading QQQ data...')
qqq_data = yf.download('QQQ', start=start_date, end=end_date, progress=False)
if isinstance(qqq_data.columns, pd.MultiIndex):
qqq_prices = qqq_data['Close']['QQQ'] if 'Close' in qqq_data.columns.get_level_values(0) else qqq_data.iloc[:, 0]
else:
qqq_prices = qqq_data['Close'] if 'Close' in qqq_data.columns else qqq_data['Adj Close']
qqq_returns = qqq_prices.pct_change().dropna()
print('\n' + '='*60)
print('QQQ STATISTICAL ANALYSIS REPORT')
print('='*60)
# 1. Descriptive Statistics
print('\n--- 1. DESCRIPTIVE STATISTICS ---')
print(f'Annual Return: {qqq_returns.mean()*252:.2%}')
print(f'Volatility: {qqq_returns.std()*np.sqrt(252):.2%}')
print(f'Skewness: {qqq_returns.skew():.2f}')
print(f'Kurtosis: {qqq_returns.kurtosis():.2f}')
# 2. Distribution Fit
print('\n--- 2. DISTRIBUTION FIT ---')
t_df, t_loc, t_scale = stats.t.fit(qqq_returns)
jb_stat, jb_pval = stats.jarque_bera(qqq_returns)
print(f't-distribution df: {t_df:.2f}')
print(f'Normal? {jb_pval > 0.05}')
# 3. Hypothesis Test vs SPY
print('\n--- 3. HYPOTHESIS TEST: QQQ vs SPY ---')
common_idx = qqq_returns.index.intersection(returns['SPY'].index)
t_stat, p_val = stats.ttest_ind(qqq_returns.loc[common_idx], returns['SPY'].loc[common_idx])
print(f'p-value: {p_val:.4f}')
print(f'Significant difference? {p_val < 0.05}')
# 4. Correlations
print('\n--- 4. CORRELATION ANALYSIS ---')
for ticker in tickers:
corr = qqq_returns.loc[common_idx].corr(returns[ticker].loc[common_idx])
print(f'QQQ vs {ticker}: {corr:.3f}')
print('\n' + '='*60)
print('END OF REPORT')
print('='*60)
Key Takeaways
What You Learned
1. Descriptive Statistics
- Mean vs Median: Mean is affected by outliers; median shows "typical" values
- Volatility: Annualized standard deviation is the standard risk measure
- Skewness: Negative skew means more extreme losses (common in stocks)
- Kurtosis: High kurtosis means fat tails and more extreme events
2. Probability Distributions
- Financial returns are NOT normal: They have fat tails
- t-distribution: Better fits financial data (3-8 degrees of freedom typical)
- Risk models assuming normality underestimate extreme events
3. Hypothesis Testing
- p-value < 0.05: Reject null hypothesis (result is "significant")
- Financial data is noisy: Hard to prove statistical significance
- Statistical significance ≠ Practical significance
4. Correlation & Covariance
- Correlation ranges from -1 to +1: Easier to interpret than covariance
- Diversification works when correlation < 1
- Correlations change over time: Especially during crises
Coming Up Next
In Module 2: Return Analysis, we'll dive deeper into: - Simple vs Log returns - Annualization of returns and risk - Risk-adjusted performance metrics (Sharpe, Sortino) - Benchmark comparison and Alpha/Beta
Congratulations on completing Module 1! You now have the statistical foundation for quantitative finance.
Module 2: Return Analysis
Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations
Learning Objectives
By the end of this module, you will be able to:
- Calculate and interpret simple vs logarithmic returns
- Properly annualize returns and volatility
- Compute risk-adjusted performance metrics (Sharpe, Sortino, Calmar)
- Compare strategies against benchmarks using Alpha, Beta, and Information Ratio
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 1 (Statistics for Finance) |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Financial Data
# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'
print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
# Calculate returns
simple_returns = prices.pct_change().dropna()
log_returns = np.log(prices / prices.shift(1)).dropna()
print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()
Section 2.1: Types of Returns
Not all returns are created equal! The way you calculate returns affects everything downstream.
In this section, you will learn: - The difference between simple and log returns - When to use each type - Why this choice matters for your analysis
2.1.1 Simple Returns (Arithmetic Returns)
Formula: R = (P₁ - P₀) / P₀ = P₁/P₀ - 1
Pros: - Intuitive to understand - Additive across assets (for portfolio returns)
Cons: - Not additive across time - Can't simply sum daily returns to get total return
# Calculate simple returns
print('=== Simple (Arithmetic) Returns ===')
print()
print('First 5 days of SPY simple returns:')
print(simple_returns['SPY'].head())
print(f'\nSum of all simple returns: {simple_returns["SPY"].sum():.2%}')
# Compare sum of returns vs actual total return
spy_prices = prices['SPY']
# Actual total return
actual_total = (spy_prices.iloc[-1] / spy_prices.iloc[0]) - 1
# Sum of simple returns (WRONG approach)
sum_of_returns = simple_returns['SPY'].sum()
# Compounded returns (CORRECT approach)
compounded = (1 + simple_returns['SPY']).prod() - 1
print('=== Total Return Calculation ===')
print()
print(f'Actual total return: {actual_total:.2%}')
print(f'Sum of simple returns: {sum_of_returns:.2%} ← WRONG!')
print(f'Compounded returns: {compounded:.2%} ← CORRECT!')
print()
print('Lesson: Never sum simple returns to get total return!')
2.1.2 Log Returns (Continuously Compounded Returns)
Formula: r = ln(P₁/P₀) = ln(P₁) - ln(P₀)
Pros: - Additive across time (sum log returns = total log return) - More symmetric (up 50% and down 50% are equal in magnitude) - Better statistical properties (more normal)
Cons: - Not additive across assets - Less intuitive
# Log returns ARE additive across time!
actual_log_total = np.log(spy_prices.iloc[-1] / spy_prices.iloc[0])
sum_of_log = log_returns['SPY'].sum()
print('=== Log Returns: Time Additivity ===')
print()
print(f'Actual log total return: {actual_log_total:.4f}')
print(f'Sum of log returns: {sum_of_log:.4f}')
print(f'Difference: {abs(actual_log_total - sum_of_log):.6f}')
print()
print('They match! Log returns can be summed across time.')
2.1.3 Converting Between Return Types
- Simple to Log: r = ln(1 + R)
- Log to Simple: R = e^r - 1
# Convert between return types
print('=== Return Conversion ===')
print()
# Take first return as example
simple_r = simple_returns['SPY'].iloc[0]
log_r = log_returns['SPY'].iloc[0]
print(f'Original simple return: {simple_r:.6f}')
print(f'Original log return: {log_r:.6f}')
print()
# Convert simple to log
simple_to_log = np.log(1 + simple_r)
print(f'Simple → Log: ln(1 + {simple_r:.6f}) = {simple_to_log:.6f}')
# Convert log to simple
log_to_simple = np.exp(log_r) - 1
print(f'Log → Simple: e^{log_r:.6f} - 1 = {log_to_simple:.6f}')
2.1.4 When to Use Each Type?
| Use Case | Return Type | Reason |
|---|---|---|
| Multi-period analysis | Log | Additive across time |
| Portfolio returns | Simple | Additive across assets |
| Statistical modeling | Log | Better distributional properties |
| Reporting to clients | Simple | More intuitive |
# Visual comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Distribution comparison
ax1 = axes[0]
ax1.hist(simple_returns['SPY'], bins=50, alpha=0.6, label='Simple', density=True)
ax1.hist(log_returns['SPY'], bins=50, alpha=0.6, label='Log', density=True)
ax1.set_xlabel('Return')
ax1.set_ylabel('Density')
ax1.set_title('Simple vs Log Returns Distribution')
ax1.legend()
# Scatter plot showing relationship
ax2 = axes[1]
ax2.scatter(simple_returns['SPY'], log_returns['SPY'], alpha=0.3, s=10)
ax2.plot([-0.15, 0.15], [-0.15, 0.15], 'r--', label='y = x')
ax2.set_xlabel('Simple Return')
ax2.set_ylabel('Log Return')
ax2.set_title('Simple vs Log Returns (nearly identical for small values)')
ax2.legend()
plt.tight_layout()
plt.show()
print('For small returns, simple ≈ log. For large returns, they diverge.')
Exercise 2.1: Calculate Total Returns (Guided)
Your Task: Calculate AAPL's total return using both compounded simple returns and summed log returns.
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_total_return(simple_rets: pd.Series, log_rets: pd.Series) -> dict:
"""
Calculate total return using both methods.
"""
# Compound simple returns: (1 + r1) * (1 + r2) * ... - 1
compounded = (1 + simple_rets).prod() - 1
# Sum log returns and convert to simple
log_sum = log_rets.sum()
from_log = np.exp(log_sum) - 1
return {
'compounded_simple': compounded,
'from_log': from_log
}
# Test
result = calculate_total_return(simple_returns['AAPL'], log_returns['AAPL'])
print(f"Compounded: {result['compounded_simple']:.2%}")
print(f"From Log: {result['from_log']:.2%}")
Section 2.2: Annualization
Returns and risk are typically quoted on an annual basis for easy comparison.
In this section, you will learn: - How to annualize returns properly - How to annualize volatility - Common pitfalls to avoid
2.2.1 Annualizing Returns
Key insight: Returns compound, so we should compound when annualizing.
Formulas: - Daily to Annual: R_annual = (1 + R_daily)^252 - 1 - Monthly to Annual: R_annual = (1 + R_monthly)^12 - 1 - Weekly to Annual: R_annual = (1 + R_weekly)^52 - 1
# Compare different annualization methods
daily_mean = simple_returns['SPY'].mean()
print('=== Annualizing Daily Returns ===')
print()
print(f'Daily mean return: {daily_mean:.6f}')
print()
# Wrong way: simple multiplication
wrong_annual = daily_mean * 252
# Right way: compounding
right_annual = (1 + daily_mean) ** 252 - 1
print(f'WRONG (multiply by 252): {wrong_annual:.2%}')
print(f'RIGHT (compound): {right_annual:.2%}')
print(f'Difference: {right_annual - wrong_annual:.2%}')
print()
print('For small returns, the difference is small. For larger returns, it matters!')
2.2.2 Annualizing Volatility
Volatility is annualized differently because variance (not std dev) is additive!
Formula: σ_annual = σ_daily × √252
Why square root? - Variance = σ² is additive for independent returns - σ²_annual = σ²_daily × 252 - Taking square root: σ_annual = σ_daily × √252
# Annualize volatility correctly
print('=== Annualized Volatility ===')
print()
for ticker in tickers:
daily_vol = simple_returns[ticker].std()
annual_vol = daily_vol * np.sqrt(252)
print(f'{ticker}: Daily {daily_vol:.4f} → Annual {annual_vol:.2%}')
# Wrong vs right volatility annualization
daily_vol = simple_returns['SPY'].std()
wrong_vol = daily_vol * 252
right_vol = daily_vol * np.sqrt(252)
print('=== Volatility Annualization ===')
print()
print(f'Daily volatility: {daily_vol:.4f}')
print()
print(f'WRONG (multiply by 252): {wrong_vol:.2%} ← Nonsensical!')
print(f'RIGHT (multiply by √252): {right_vol:.2%} ← Makes sense')
print()
print('Remember: √252 ≈ 15.87')
Exercise 2.2: Annualize Quarterly Data (Guided)
Your Task: Calculate quarterly returns for AAPL and annualize both return and volatility.
Fill in the blanks:
Click to reveal solution
def annualize_quarterly(prices_series: pd.Series) -> dict:
"""
Calculate and annualize quarterly statistics.
"""
# Resample to quarterly prices (end of quarter)
quarterly_prices = prices_series.resample('Q').last()
# Calculate quarterly returns
quarterly_returns = quarterly_prices.pct_change().dropna()
# Calculate quarterly statistics
q_mean = quarterly_returns.mean()
q_vol = quarterly_returns.std()
# Annualize (4 quarters per year)
annual_return = (1 + q_mean) ** 4 - 1
annual_vol = q_vol * np.sqrt(4)
return {
'quarterly_return': q_mean,
'quarterly_vol': q_vol,
'annual_return': annual_return,
'annual_vol': annual_vol
}
# Test
result = annualize_quarterly(prices['AAPL'])
print(f"Quarterly Return: {result['quarterly_return']:.2%}")
print(f"Annualized Return: {result['annual_return']:.2%}")
print(f"Quarterly Vol: {result['quarterly_vol']:.2%}")
print(f"Annualized Vol: {result['annual_vol']:.2%}")
Section 2.3: Risk-Adjusted Returns
Raw returns don't tell the whole story. A 20% return with 50% volatility isn't as good as 15% return with 10% volatility!
In this section, you will learn: - Sharpe Ratio: The most popular risk-adjusted metric - Sortino Ratio: Penalizes only downside risk - Calmar Ratio: Uses maximum drawdown as risk
2.3.1 The Sharpe Ratio
Formula: Sharpe = (R_portfolio - R_f) / σ_portfolio
Where: - R_portfolio = Portfolio return - R_f = Risk-free rate (e.g., T-bills) - σ_portfolio = Portfolio volatility
Interpretation: - Sharpe < 1: Subpar risk-adjusted returns - 1 ≤ Sharpe < 2: Good - 2 ≤ Sharpe < 3: Very good - Sharpe ≥ 3: Excellent (rare for long periods)
# Calculate Sharpe Ratio
risk_free_rate = 0.02 # Assume 2% annual risk-free rate
def calculate_sharpe(returns: pd.Series, risk_free_rate: float = 0.02) -> float:
"""Calculate annualized Sharpe ratio."""
excess_returns = returns - (risk_free_rate / 252)
return (excess_returns.mean() / excess_returns.std()) * np.sqrt(252)
print('=== Sharpe Ratios (Risk-Free Rate = 2%) ===')
print()
sharpe_ratios = {}
for ticker in tickers:
sharpe = calculate_sharpe(simple_returns[ticker])
sharpe_ratios[ticker] = sharpe
quality = 'Excellent' if sharpe >= 1 else ('Good' if sharpe >= 0.5 else 'Poor')
print(f'{ticker}: {sharpe:.3f} ({quality})')
2.3.2 The Sortino Ratio
The Sharpe ratio penalizes all volatility equally. But investors mainly dislike downside volatility!
Formula: Sortino = (R_portfolio - R_f) / σ_downside
Where σ_downside only considers returns below a threshold (typically 0 or R_f).
def calculate_sortino(returns: pd.Series, risk_free_rate: float = 0.02, target: float = 0) -> float:
"""Calculate annualized Sortino ratio."""
excess_returns = returns - (risk_free_rate / 252)
# Downside deviation: only returns below target
downside_returns = returns[returns < target]
downside_std = np.sqrt(np.mean(downside_returns**2))
return (excess_returns.mean() / downside_std) * np.sqrt(252)
print('=== Sortino Ratios (vs Sharpe) ===')
print()
print(f'{"Asset":<6} {"Sharpe":>10} {"Sortino":>10}')
print('-' * 28)
for ticker in tickers:
sharpe = calculate_sharpe(simple_returns[ticker])
sortino = calculate_sortino(simple_returns[ticker])
print(f'{ticker:<6} {sharpe:>10.3f} {sortino:>10.3f}')
2.3.3 The Calmar Ratio
Instead of volatility, the Calmar ratio uses maximum drawdown as the risk measure.
Formula: Calmar = Annual Return / Maximum Drawdown
Maximum Drawdown: The largest peak-to-trough decline in portfolio value.
def calculate_max_drawdown(prices: pd.Series) -> float:
"""Calculate maximum drawdown from price series."""
running_max = prices.cummax()
drawdown = (prices - running_max) / running_max
return drawdown.min()
def calculate_calmar(returns: pd.Series, prices: pd.Series) -> float:
"""Calculate Calmar ratio."""
annual_return = (1 + returns.mean()) ** 252 - 1
max_dd = calculate_max_drawdown(prices)
return annual_return / abs(max_dd)
print('=== Maximum Drawdowns & Calmar Ratios ===')
print()
print(f'{"Asset":<6} {"Max Drawdown":>12} {"Annual Return":>14} {"Calmar":>10}')
print('-' * 46)
for ticker in tickers:
max_dd = calculate_max_drawdown(prices[ticker])
annual_ret = (1 + simple_returns[ticker].mean()) ** 252 - 1
calmar = calculate_calmar(simple_returns[ticker], prices[ticker])
print(f'{ticker:<6} {max_dd:>12.2%} {annual_ret:>14.2%} {calmar:>10.3f}')
# Visualize drawdowns
fig, axes = plt.subplots(2, 1, figsize=(14, 8))
# Normalize prices to start at 100
normalized = prices / prices.iloc[0] * 100
# Price chart
ax1 = axes[0]
for ticker in tickers:
ax1.plot(normalized[ticker], label=ticker, linewidth=1.5)
ax1.set_ylabel('Normalized Price (100 = Start)')
ax1.set_title('Asset Prices Over Time')
ax1.legend(loc='upper left')
# Drawdown chart
ax2 = axes[1]
for ticker in tickers:
running_max = prices[ticker].cummax()
drawdown = (prices[ticker] - running_max) / running_max
ax2.fill_between(drawdown.index, 0, drawdown, alpha=0.3, label=ticker)
ax2.set_ylabel('Drawdown')
ax2.set_title('Underwater Chart (Drawdowns)')
ax2.legend(loc='lower left')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()
Exercise 2.3: Calculate Risk-Adjusted Metrics (Guided)
Your Task: Build a function that calculates all three risk-adjusted ratios for a given asset.
Fill in the blanks:
Click to reveal solution
def risk_adjusted_metrics(returns: pd.Series, prices: pd.Series,
risk_free: float = 0.02) -> dict:
"""
Calculate Sharpe, Sortino, and Calmar ratios.
"""
# Calculate excess returns
daily_rf = risk_free / 252
excess = returns - daily_rf
# Sharpe = annualized excess return / annualized volatility
sharpe = (excess.mean() / excess.std()) * np.sqrt(252)
# Sortino - only downside volatility
downside = returns[returns < 0]
downside_std = np.sqrt((downside ** 2).mean())
sortino = (excess.mean() / downside_std) * np.sqrt(252)
# Calmar = annual return / |max drawdown|
annual_ret = (1 + returns.mean()) ** 252 - 1
running_max = prices.cummax()
drawdown = (prices - running_max) / running_max
max_dd = drawdown.min()
calmar = annual_ret / abs(max_dd)
return {'sharpe': sharpe, 'sortino': sortino, 'calmar': calmar}
# Test
metrics = risk_adjusted_metrics(simple_returns['AAPL'], prices['AAPL'])
for name, value in metrics.items():
print(f"{name.capitalize()}: {value:.3f}")
Section 2.4: Benchmark Comparison
How does your strategy compare to a simple benchmark? This section covers Alpha, Beta, and other benchmark-relative metrics.
In this section, you will learn: - Alpha: Excess return vs benchmark - Beta: Sensitivity to benchmark movements - Information Ratio: Risk-adjusted active return
2.4.1 Beta: Market Sensitivity
Beta measures how much an asset moves relative to the market.
Formula: β = Cov(R_asset, R_market) / Var(R_market)
Interpretation: - β = 1: Moves exactly with the market - β > 1: More volatile than market (amplifies movements) - β < 1: Less volatile than market (dampens movements) - β < 0: Moves opposite to market (rare)
# Calculate Beta using SPY as the market
market_returns = simple_returns['SPY']
def calculate_beta(asset_returns: pd.Series, market_returns: pd.Series) -> float:
"""Calculate beta relative to market."""
covariance = asset_returns.cov(market_returns)
market_variance = market_returns.var()
return covariance / market_variance
print('=== Beta Values (vs SPY) ===')
print()
for ticker in tickers:
if ticker == 'SPY':
beta = 1.0
else:
beta = calculate_beta(simple_returns[ticker], market_returns)
interpretation = 'More risky' if beta > 1 else ('Less risky' if beta < 1 else 'Same as market')
print(f'{ticker}: {beta:.3f} ({interpretation})')
2.4.2 Alpha: Excess Return
Alpha measures return above what beta predicts.
Formula: α = R_asset - [R_f + β × (R_market - R_f)]
Interpretation: - α > 0: Asset outperformed risk-adjusted expectations - α < 0: Asset underperformed - α = 0: Performance exactly as expected given beta
def calculate_alpha(asset_returns: pd.Series, market_returns: pd.Series,
risk_free_rate: float = 0.02) -> float:
"""Calculate annualized alpha using CAPM."""
beta = calculate_beta(asset_returns, market_returns)
# Annualized returns
asset_annual = (1 + asset_returns.mean()) ** 252 - 1
market_annual = (1 + market_returns.mean()) ** 252 - 1
# Expected return based on CAPM
expected = risk_free_rate + beta * (market_annual - risk_free_rate)
# Alpha is the difference
return asset_annual - expected
print('=== Alpha Values (Annualized) ===')
print()
for ticker in tickers:
beta = calculate_beta(simple_returns[ticker], market_returns)
alpha = calculate_alpha(simple_returns[ticker], market_returns)
performance = 'Outperformed' if alpha > 0 else 'Underperformed'
print(f'{ticker}: α = {alpha:>7.2%}, β = {beta:.3f} ({performance})')
2.4.3 Information Ratio
For active managers, we care about: - Active Return: Return difference vs benchmark - Tracking Error: Volatility of active returns - Information Ratio: Active return / Tracking error
def calculate_information_ratio(asset_returns: pd.Series,
benchmark_returns: pd.Series) -> float:
"""Calculate Information Ratio."""
active_returns = asset_returns - benchmark_returns
active_return_annual = active_returns.mean() * 252
tracking_error = active_returns.std() * np.sqrt(252)
return active_return_annual / tracking_error
print('=== Benchmark-Relative Metrics (vs SPY) ===')
print()
print(f'{"Asset":<6} {"Active Return":>14} {"Tracking Error":>14} {"Info Ratio":>12}')
print('-' * 50)
for ticker in tickers:
if ticker == 'SPY':
continue
active_ret = (simple_returns[ticker].mean() - market_returns.mean()) * 252
te = (simple_returns[ticker] - market_returns).std() * np.sqrt(252)
ir = calculate_information_ratio(simple_returns[ticker], market_returns)
print(f'{ticker:<6} {active_ret:>14.2%} {te:>14.2%} {ir:>12.3f}')
Exercise 2.4: Build a Performance Comparison Tool (Open-ended)
Your Task:
Build a function that compares any asset against a benchmark and returns a comprehensive report including: - Alpha and Beta - Information Ratio - Correlation with benchmark - Relative Sharpe (asset Sharpe - benchmark Sharpe)
Your implementation:
Click to reveal solution
def compare_to_benchmark(asset_returns: pd.Series,
benchmark_returns: pd.Series,
risk_free: float = 0.02) -> dict:
"""
Comprehensive benchmark comparison.
Args:
asset_returns: Daily returns of asset
benchmark_returns: Daily returns of benchmark
risk_free: Annual risk-free rate
Returns:
Dictionary with all comparison metrics
"""
# Beta
cov = asset_returns.cov(benchmark_returns)
var = benchmark_returns.var()
beta = cov / var
# Alpha (annualized)
asset_annual = (1 + asset_returns.mean()) ** 252 - 1
bench_annual = (1 + benchmark_returns.mean()) ** 252 - 1
expected = risk_free + beta * (bench_annual - risk_free)
alpha = asset_annual - expected
# Information Ratio
active = asset_returns - benchmark_returns
ir = (active.mean() * 252) / (active.std() * np.sqrt(252))
# Correlation
correlation = asset_returns.corr(benchmark_returns)
# Sharpe comparison
excess_asset = asset_returns - risk_free/252
excess_bench = benchmark_returns - risk_free/252
sharpe_asset = (excess_asset.mean() / excess_asset.std()) * np.sqrt(252)
sharpe_bench = (excess_bench.mean() / excess_bench.std()) * np.sqrt(252)
return {
'alpha': alpha,
'beta': beta,
'information_ratio': ir,
'correlation': correlation,
'asset_sharpe': sharpe_asset,
'benchmark_sharpe': sharpe_bench,
'relative_sharpe': sharpe_asset - sharpe_bench
}
# Test with AAPL vs SPY
comparison = compare_to_benchmark(simple_returns['AAPL'], simple_returns['SPY'])
print("=== AAPL vs SPY Comparison ===")
print(f"Alpha: {comparison['alpha']:.2%}")
print(f"Beta: {comparison['beta']:.3f}")
print(f"Information Ratio: {comparison['information_ratio']:.3f}")
print(f"Correlation: {comparison['correlation']:.3f}")
print(f"Asset Sharpe: {comparison['asset_sharpe']:.3f}")
print(f"Benchmark Sharpe: {comparison['benchmark_sharpe']:.3f}")
print(f"Relative Sharpe: {comparison['relative_sharpe']:.3f}")
Exercise 2.5: Multi-Asset Return Analysis (Open-ended)
Your Task:
Build a function that: - Takes a list of tickers and a date range - Calculates simple and log returns for each - Computes annualized return and volatility - Returns a sorted DataFrame (by Sharpe ratio)
Your implementation:
Click to reveal solution
def analyze_multiple_assets(tickers: list,
start: str,
end: str,
risk_free: float = 0.02) -> pd.DataFrame:
"""
Analyze returns for multiple assets.
Args:
tickers: List of ticker symbols
start: Start date string
end: End date string
risk_free: Annual risk-free rate
Returns:
DataFrame sorted by Sharpe ratio
"""
# Download data
data = yf.download(tickers, start=start, end=end, progress=False)
# Handle column structure
if isinstance(data.columns, pd.MultiIndex):
prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
prices = data
results = []
for ticker in tickers:
if ticker not in prices.columns:
continue
p = prices[ticker].dropna()
simple_ret = p.pct_change().dropna()
log_ret = np.log(p / p.shift(1)).dropna()
# Annualized metrics
annual_return = (1 + simple_ret.mean()) ** 252 - 1
annual_vol = simple_ret.std() * np.sqrt(252)
# Sharpe
excess = simple_ret - risk_free/252
sharpe = (excess.mean() / excess.std()) * np.sqrt(252)
# Total return
total_return = (p.iloc[-1] / p.iloc[0]) - 1
results.append({
'Ticker': ticker,
'Total Return': total_return,
'Annual Return': annual_return,
'Annual Vol': annual_vol,
'Sharpe': sharpe
})
df = pd.DataFrame(results).set_index('Ticker')
return df.sort_values('Sharpe', ascending=False)
# Test
analysis = analyze_multiple_assets(
['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META'],
'2020-01-01', '2024-01-01'
)
print(analysis.to_string(formatters={
'Total Return': '{:.2%}'.format,
'Annual Return': '{:.2%}'.format,
'Annual Vol': '{:.2%}'.format,
'Sharpe': '{:.3f}'.format
}))
Exercise 2.6: Drawdown Analysis Tool (Open-ended)
Your Task:
Build a comprehensive drawdown analyzer that: - Calculates maximum drawdown and its duration - Finds the top 5 worst drawdown periods - Calculates average time to recovery - Plots the underwater chart
Your implementation:
Click to reveal solution
class DrawdownAnalyzer:
"""
Comprehensive drawdown analysis tool.
"""
def __init__(self, prices: pd.Series):
self.prices = prices
self.running_max = prices.cummax()
self.drawdown = (prices - self.running_max) / self.running_max
def max_drawdown(self) -> float:
"""Return maximum drawdown."""
return self.drawdown.min()
def max_drawdown_duration(self) -> int:
"""Calculate duration of max drawdown in days."""
# Find trough date
trough_date = self.drawdown.idxmin()
# Find peak before trough
peak_date = self.prices[:trough_date].idxmax()
# Find recovery date (if any)
post_trough = self.prices[trough_date:]
peak_value = self.prices[peak_date]
recovery = post_trough[post_trough >= peak_value]
if len(recovery) > 0:
recovery_date = recovery.index[0]
return (recovery_date - peak_date).days
else:
return (self.prices.index[-1] - peak_date).days
def top_drawdowns(self, n: int = 5) -> pd.DataFrame:
"""Find top N drawdown periods."""
# Simple approach: find local minima
dd = self.drawdown.copy()
results = []
for i in range(n):
if dd.min() >= 0:
break
trough_date = dd.idxmin()
trough_value = dd.min()
results.append({
'Trough Date': trough_date,
'Drawdown': trough_value
})
# Zero out this drawdown period
mask = (dd.index >= trough_date - pd.Timedelta(days=30)) & \
(dd.index <= trough_date + pd.Timedelta(days=30))
dd[mask] = 0
return pd.DataFrame(results)
def plot(self):
"""Plot underwater chart."""
fig, ax = plt.subplots(figsize=(14, 6))
ax.fill_between(self.drawdown.index, 0, self.drawdown,
alpha=0.5, color='red')
ax.set_title('Underwater Chart (Drawdowns)')
ax.set_ylabel('Drawdown')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()
# Test
analyzer = DrawdownAnalyzer(prices['SPY'])
print(f"Max Drawdown: {analyzer.max_drawdown():.2%}")
print(f"Max DD Duration: {analyzer.max_drawdown_duration()} days")
print("\nTop 5 Drawdowns:")
print(analyzer.top_drawdowns())
analyzer.plot()
Module Project: Complete Performance Report
Create a professional performance report for a portfolio.
Your Challenge:
Create an equal-weight portfolio of SPY, AAPL, GLD, and TLT, then: 1. Calculate portfolio returns (use simple returns weighted by 25% each) 2. Compare to SPY as benchmark 3. Report: Annual return, Volatility, Sharpe, Sortino, Max Drawdown, Alpha, Beta
# YOUR CODE HERE - Build the portfolio and create a complete report
Click to reveal solution
# Complete Performance Report Solution
# Create equal-weight portfolio
portfolio_tickers = ['SPY', 'AAPL', 'GLD', 'TLT']
weights = [0.25, 0.25, 0.25, 0.25]
# Calculate portfolio returns
portfolio_returns = sum(w * simple_returns[t] for w, t in zip(weights, portfolio_tickers))
# Create portfolio price series for drawdown calculation
portfolio_prices = (1 + portfolio_returns).cumprod() * 100
print('='*60)
print('PORTFOLIO PERFORMANCE REPORT')
print('Equal-Weight: SPY (25%), AAPL (25%), GLD (25%), TLT (25%)')
print('='*60)
# Calculate all metrics
annual_return = (1 + portfolio_returns.mean()) ** 252 - 1
annual_vol = portfolio_returns.std() * np.sqrt(252)
# Sharpe
risk_free = 0.02
excess = portfolio_returns - risk_free/252
sharpe = (excess.mean() / excess.std()) * np.sqrt(252)
# Sortino
downside = portfolio_returns[portfolio_returns < 0]
downside_std = np.sqrt((downside ** 2).mean())
sortino = (excess.mean() / downside_std) * np.sqrt(252)
# Max Drawdown
running_max = portfolio_prices.cummax()
drawdown = (portfolio_prices - running_max) / running_max
max_dd = drawdown.min()
# Calmar
calmar = annual_return / abs(max_dd)
# Alpha and Beta (vs SPY)
market = simple_returns['SPY']
cov = portfolio_returns.cov(market)
var = market.var()
beta = cov / var
market_annual = (1 + market.mean()) ** 252 - 1
expected = risk_free + beta * (market_annual - risk_free)
alpha = annual_return - expected
# Benchmark metrics
spy_annual = market_annual
spy_vol = market.std() * np.sqrt(252)
spy_excess = market - risk_free/252
spy_sharpe = (spy_excess.mean() / spy_excess.std()) * np.sqrt(252)
spy_max_dd = ((prices['SPY'] - prices['SPY'].cummax()) / prices['SPY'].cummax()).min()
# Print Report
print()
print('--- Return Metrics ---')
print(f'{"Metric":<25} {"Portfolio":>15} {"SPY Benchmark":>15}')
print('-' * 55)
print(f'{"Annual Return":<25} {annual_return:>15.2%} {spy_annual:>15.2%}')
print(f'{"Annual Volatility":<25} {annual_vol:>15.2%} {spy_vol:>15.2%}')
print()
print('--- Risk-Adjusted Metrics ---')
print(f'{"Sharpe Ratio":<25} {sharpe:>15.3f} {spy_sharpe:>15.3f}')
print(f'{"Sortino Ratio":<25} {sortino:>15.3f}')
print(f'{"Calmar Ratio":<25} {calmar:>15.3f}')
print(f'{"Maximum Drawdown":<25} {max_dd:>15.2%} {spy_max_dd:>15.2%}')
print()
print('--- Benchmark-Relative Metrics ---')
print(f'{"Alpha":<25} {alpha:>15.2%}')
print(f'{"Beta":<25} {beta:>15.3f}')
print()
print('='*60)
print('SUMMARY: ', end='')
if sharpe > spy_sharpe and alpha > 0:
print('Portfolio OUTPERFORMED SPY on risk-adjusted basis!')
else:
print('Portfolio underperformed SPY on risk-adjusted basis.')
Key Takeaways
What You Learned
1. Types of Returns
- Simple returns: Intuitive, additive across assets, NOT additive across time
- Log returns: Additive across time, better statistical properties
- Use simple for portfolio weights, log for time-series analysis
2. Annualization
- Returns: Compound using (1 + r)^n - 1
- Volatility: Multiply by √n (not n!)
- Use 252 for daily stock data, 12 for monthly, etc.
3. Risk-Adjusted Metrics
- Sharpe Ratio: Return per unit of total risk (most popular)
- Sortino Ratio: Return per unit of downside risk
- Calmar Ratio: Return per unit of maximum drawdown
4. Benchmark Comparison
- Alpha: Return above what beta predicts (skill measure)
- Beta: Sensitivity to benchmark movements
- Information Ratio: Risk-adjusted active return vs benchmark
Formula Reference
| Metric | Formula |
|---|---|
| Simple Return | R = P₁/P₀ - 1 |
| Log Return | r = ln(P₁/P₀) |
| Annual Return | (1 + daily)^252 - 1 |
| Annual Vol | daily_vol × √252 |
| Sharpe | (R - Rf) / σ |
| Sortino | (R - Rf) / σ_downside |
| Calmar | Annual Return / Max DD |
| Beta | Cov(R, Rm) / Var(Rm) |
| Alpha | R - [Rf + β(Rm - Rf)] |
Coming Up Next
In Module 3: Time Series Analysis, we'll explore: - Stationarity and why it matters - Autocorrelation in returns - Moving statistics (rolling, expanding, EWM) - Volatility clustering and modeling
Congratulations on completing Module 2! You now have the tools to properly analyze and compare investment performance.
Module 3: Time Series Analysis
Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations
Learning Objectives
By the end of this module, you will be able to:
- Test for stationarity and understand why it matters
- Analyze autocorrelation in financial returns
- Apply moving statistics (rolling, expanding, exponential)
- Understand and model volatility clustering
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Modules 1-2 (Statistics, Return Analysis) |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Financial Data
# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'
print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
# Calculate returns
returns = prices.pct_change().dropna()
log_returns = np.log(prices / prices.shift(1)).dropna()
print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()
Section 3.1: Stationarity
Stationarity is a fundamental concept in time series analysis. Most statistical models assume stationarity!
In this section, you will learn: - What stationarity means and why it matters - How to test for stationarity (ADF test) - How to transform non-stationary data
3.1.1 What is Stationarity?
A time series is stationary if its statistical properties don't change over time:
- Constant mean: E[X_t] = μ for all t
- Constant variance: Var(X_t) = σ² for all t
- Covariance depends only on lag: Cov(X_t, X_{t+k}) depends only on k, not t
Why does it matter? - Non-stationary data can lead to spurious correlations - Most forecasting models require stationarity - Statistical inference assumes constant distributions
# Compare prices (non-stationary) vs returns (stationary)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
spy_prices = prices['SPY']
spy_returns = returns['SPY']
# Prices
ax1 = axes[0, 0]
ax1.plot(spy_prices)
ax1.set_title('SPY Prices (Non-Stationary)')
ax1.set_ylabel('Price')
# Returns
ax2 = axes[0, 1]
ax2.plot(spy_returns)
ax2.axhline(y=0, color='red', linestyle='--', alpha=0.5)
ax2.set_title('SPY Returns (Stationary)')
ax2.set_ylabel('Return')
# Rolling mean of prices
ax3 = axes[1, 0]
rolling_mean_prices = spy_prices.rolling(60).mean()
ax3.plot(spy_prices, alpha=0.5, label='Prices')
ax3.plot(rolling_mean_prices, color='red', lw=2, label='60-day Mean')
ax3.set_title('Prices: Mean Changes Over Time')
ax3.legend()
# Rolling mean of returns
ax4 = axes[1, 1]
rolling_mean_returns = spy_returns.rolling(60).mean()
ax4.plot(spy_returns, alpha=0.5, label='Returns')
ax4.plot(rolling_mean_returns, color='red', lw=2, label='60-day Mean')
ax4.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax4.set_title('Returns: Mean Stays Around Zero')
ax4.legend()
plt.tight_layout()
plt.show()
print('Notice: Prices have a clear trend (non-stationary).')
print('Returns fluctuate around zero with no trend (stationary).')
3.1.2 Testing for Stationarity: ADF Test
The Augmented Dickey-Fuller (ADF) test is the standard test for stationarity.
Hypotheses: - H₀: Series has a unit root (non-stationary) - H₁: Series is stationary
Interpretation: - p-value < 0.05 → Reject H₀ → Series IS stationary - p-value ≥ 0.05 → Cannot reject H₀ → Series is NOT stationary
def test_stationarity(series: pd.Series, name: str = 'Series') -> bool:
"""
Perform ADF test and print results.
Args:
series: Time series to test
name: Name for display
Returns:
True if stationary, False otherwise
"""
result = adfuller(series.dropna())
print(f'=== ADF Test: {name} ===')
print(f'Test Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
print(f'Critical Values:')
for key, value in result[4].items():
print(f' {key}: {value:.4f}')
if result[1] < 0.05:
print(f'\nConclusion: {name} IS stationary (p < 0.05)')
return True
else:
print(f'\nConclusion: {name} is NOT stationary (p >= 0.05)')
return False
# Test prices
print('Testing SPY Prices...')
test_stationarity(prices['SPY'], 'SPY Prices')
print('\n' + '='*50 + '\n')
# Test returns
print('Testing SPY Returns...')
test_stationarity(returns['SPY'], 'SPY Returns')
3.1.3 Making Data Stationary
Common transformations to achieve stationarity:
- Differencing: X_t - X_{t-1} (converts prices to returns)
- Log transformation: ln(X_t) (stabilizes variance)
- Log returns: ln(X_t / X_{t-1}) (combines both)
- Detrending: Remove linear or polynomial trend
# Different transformations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
spy = prices['SPY']
# Original prices
ax1 = axes[0, 0]
ax1.plot(spy)
ax1.set_title('Original Prices (Non-Stationary)')
# First difference
ax2 = axes[0, 1]
diff = spy.diff().dropna()
ax2.plot(diff)
ax2.set_title('First Difference (Stationary)')
# Log prices
ax3 = axes[1, 0]
log_prices = np.log(spy)
ax3.plot(log_prices)
ax3.set_title('Log Prices (Still Non-Stationary)')
# Log returns
ax4 = axes[1, 1]
ax4.plot(log_returns['SPY'])
ax4.set_title('Log Returns (Stationary)')
plt.tight_layout()
plt.show()
# Test all transformations
print('=== Stationarity Test Results ===')
print()
transformations = {
'Original Prices': spy,
'First Difference': diff,
'Log Prices': log_prices,
'Log Returns': log_returns['SPY']
}
for name, series in transformations.items():
result = adfuller(series.dropna())
status = 'Stationary' if result[1] < 0.05 else 'Non-Stationary'
print(f'{name:20} p-value: {result[1]:.4f} → {status}')
Exercise 3.1: Test Stationarity (Guided)
Your Task: Build a function that tests stationarity for multiple assets and returns a summary DataFrame.
Fill in the blanks:
Click to reveal solution
def stationarity_summary(prices_df: pd.DataFrame, returns_df: pd.DataFrame) -> pd.DataFrame:
"""
Test stationarity for all assets, both prices and returns.
"""
results = []
for ticker in prices_df.columns:
# Run ADF test on prices
price_result = adfuller(prices_df[ticker].dropna())
price_pval = price_result[1]
# Run ADF test on returns
return_result = adfuller(returns_df[ticker].dropna())
return_pval = return_result[1]
results.append({
'Ticker': ticker,
'Prices p-value': price_pval,
'Prices Stationary': price_pval < 0.05,
'Returns p-value': return_pval,
'Returns Stationary': return_pval < 0.05
})
return pd.DataFrame(results).set_index('Ticker')
# Test
summary = stationarity_summary(prices, returns)
print(summary)
print("\nKey Finding: All prices are non-stationary, all returns are stationary!")
Section 3.2: Autocorrelation
Autocorrelation measures how today's value relates to past values. It's key for understanding predictability.
In this section, you will learn: - What autocorrelation means - How to compute and visualize ACF/PACF - What autocorrelation patterns tell us about markets
3.2.1 Understanding Autocorrelation
Autocorrelation at lag k: Correlation between X_t and X_{t-k}
$$\rho_k = \frac{Cov(X_t, X_{t-k})}{Var(X_t)}$$
Interpretation: - ρ_k > 0: Positive values tend to follow positive values (momentum) - ρ_k < 0: Positive values tend to follow negative values (mean reversion) - ρ_k ≈ 0: No linear relationship (random/efficient)
# Calculate autocorrelation manually
spy_ret = returns['SPY']
print('=== Manual Autocorrelation Calculation ===')
print()
for lag in range(1, 6):
# Correlation between returns and lagged returns
autocorr = spy_ret.corr(spy_ret.shift(lag))
print(f'Lag {lag}: {autocorr:.4f}')
print()
print('Values close to zero suggest returns are roughly independent.')
print('This is consistent with the Efficient Market Hypothesis!')
3.2.2 ACF and PACF Plots
ACF (Autocorrelation Function): Shows correlation at all lags
PACF (Partial Autocorrelation Function): Shows correlation at lag k after removing effects of lags 1 to k-1
The blue bands represent 95% confidence intervals. Significant autocorrelations extend beyond the bands.
# ACF and PACF for returns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# SPY Returns ACF
plot_acf(returns['SPY'].dropna(), ax=axes[0, 0], lags=30, alpha=0.05)
axes[0, 0].set_title('SPY Returns - ACF')
# SPY Returns PACF
plot_pacf(returns['SPY'].dropna(), ax=axes[0, 1], lags=30, alpha=0.05)
axes[0, 1].set_title('SPY Returns - PACF')
# SPY Squared Returns ACF (volatility clustering)
plot_acf(returns['SPY'].dropna()**2, ax=axes[1, 0], lags=30, alpha=0.05)
axes[1, 0].set_title('SPY Squared Returns - ACF (Volatility Clustering)')
# SPY Absolute Returns ACF
plot_acf(returns['SPY'].dropna().abs(), ax=axes[1, 1], lags=30, alpha=0.05)
axes[1, 1].set_title('SPY Absolute Returns - ACF')
plt.tight_layout()
plt.show()
print('Returns (top): Little autocorrelation - market is efficient')
print('Squared Returns (bottom): Strong autocorrelation - volatility clusters!')
3.2.3 Testing for Significant Autocorrelation
The Ljung-Box test checks if there's any significant autocorrelation up to lag k.
- H₀: No autocorrelation (returns are independent)
- H₁: Significant autocorrelation exists
def ljung_box_test(series: pd.Series, lags: int = 10, name: str = 'Series') -> pd.DataFrame:
"""
Perform Ljung-Box test for autocorrelation.
Args:
series: Time series to test
lags: Number of lags to test
name: Name for display
Returns:
Test results DataFrame
"""
result = acorr_ljungbox(series.dropna(), lags=lags, return_df=True)
print(f'=== Ljung-Box Test: {name} ===')
print(f'Testing for autocorrelation up to lag {lags}')
print()
min_pval = result['lb_pvalue'].min()
significant_lags = (result['lb_pvalue'] < 0.05).sum()
print(f'Minimum p-value: {min_pval:.4f}')
print(f'Significant lags (p < 0.05): {significant_lags} out of {lags}')
if min_pval < 0.05:
print('\nConclusion: Significant autocorrelation detected!')
else:
print('\nConclusion: No significant autocorrelation (consistent with EMH)')
return result
# Test returns
print('--- Testing SPY Returns ---')
lb_returns = ljung_box_test(returns['SPY'], lags=10, name='SPY Returns')
print('\n' + '='*50 + '\n')
# Test squared returns
print('--- Testing SPY Squared Returns ---')
lb_squared = ljung_box_test(returns['SPY']**2, lags=10, name='SPY Squared Returns')
Exercise 3.2: Autocorrelation Analysis (Guided)
Your Task: Build a function that calculates autocorrelation for both returns and squared returns.
Fill in the blanks:
Click to reveal solution
def autocorrelation_analysis(returns_series: pd.Series, max_lag: int = 5) -> pd.DataFrame:
"""
Calculate autocorrelation for returns and squared returns.
"""
squared = returns_series ** 2
results = []
for lag in range(1, max_lag + 1):
# Calculate autocorrelation of returns
ret_acf = returns_series.corr(returns_series.shift(lag))
# Calculate autocorrelation of squared returns
sq_acf = squared.corr(squared.shift(lag))
results.append({
'Lag': lag,
'Returns ACF': ret_acf,
'Squared Returns ACF': sq_acf
})
return pd.DataFrame(results).set_index('Lag')
# Test
acf_df = autocorrelation_analysis(returns['AAPL'])
print(acf_df)
print("\nNote: Squared returns have MUCH higher autocorrelation!")
Section 3.3: Moving Statistics
Moving statistics help us analyze trends and patterns that change over time.
In this section, you will learn: - Rolling (moving window) statistics - Expanding (cumulative) statistics - Exponentially weighted statistics
3.3.1 Rolling Statistics
Rolling statistics use a fixed window that moves through time.
Common applications: - Rolling mean (moving average) - Rolling volatility - Rolling correlation - Rolling Sharpe ratio
# Calculate rolling statistics for SPY
spy_ret = returns['SPY']
# Rolling statistics with different windows
windows = [20, 60, 252] # 1 month, 3 months, 1 year
fig, axes = plt.subplots(3, 1, figsize=(14, 12))
# Rolling Mean
ax1 = axes[0]
ax1.plot(spy_ret, alpha=0.3, label='Daily Returns')
for window in windows:
rolling_mean = spy_ret.rolling(window).mean()
ax1.plot(rolling_mean, label=f'{window}-day Mean', linewidth=1.5)
ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax1.set_title('Rolling Mean (Moving Average)')
ax1.legend()
# Rolling Volatility (annualized)
ax2 = axes[1]
for window in windows:
rolling_vol = spy_ret.rolling(window).std() * np.sqrt(252)
ax2.plot(rolling_vol, label=f'{window}-day Vol', linewidth=1.5)
ax2.set_title('Rolling Volatility (Annualized)')
ax2.set_ylabel('Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()
# Rolling Sharpe (annualized)
ax3 = axes[2]
for window in windows:
rolling_sharpe = (spy_ret.rolling(window).mean() / spy_ret.rolling(window).std()) * np.sqrt(252)
ax3.plot(rolling_sharpe, label=f'{window}-day Sharpe', linewidth=1.5)
ax3.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax3.axhline(y=1, color='green', linestyle='--', alpha=0.3, label='Sharpe = 1')
ax3.set_title('Rolling Sharpe Ratio')
ax3.legend()
plt.tight_layout()
plt.show()
3.3.2 Expanding Statistics
Expanding statistics include all data from the start up to the current point.
Useful for: - Cumulative performance - Building samples for statistical tests - Comparing to "all-time" metrics
# Expanding statistics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Expanding mean
ax1 = axes[0, 0]
expanding_mean = spy_ret.expanding().mean()
ax1.plot(expanding_mean * 252, label='Expanding Mean (Annualized)')
ax1.axhline(y=spy_ret.mean() * 252, color='red', linestyle='--', label='Final Mean')
ax1.set_title('Expanding Mean Return')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.1%}'))
ax1.legend()
# Expanding volatility
ax2 = axes[0, 1]
expanding_vol = spy_ret.expanding().std() * np.sqrt(252)
ax2.plot(expanding_vol, label='Expanding Vol (Annualized)')
ax2.axhline(y=spy_ret.std() * np.sqrt(252), color='red', linestyle='--', label='Final Vol')
ax2.set_title('Expanding Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()
# Expanding Sharpe
ax3 = axes[1, 0]
expanding_sharpe = (expanding_mean / spy_ret.expanding().std()) * np.sqrt(252)
ax3.plot(expanding_sharpe)
ax3.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax3.set_title('Expanding Sharpe Ratio')
# Cumulative return
ax4 = axes[1, 1]
cumulative_return = (1 + spy_ret).cumprod() - 1
ax4.plot(cumulative_return)
ax4.set_title('Cumulative Return')
ax4.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()
3.3.3 Exponentially Weighted Statistics
Exponentially weighted statistics give more weight to recent observations.
Key parameter: span (or halflife) - Smaller span = more weight on recent data - Larger span = smoother, more like simple average
# Exponentially weighted statistics
fig, axes = plt.subplots(2, 1, figsize=(14, 10))
# EWM Mean vs Rolling Mean
ax1 = axes[0]
ax1.plot(spy_ret, alpha=0.2, label='Daily Returns')
ax1.plot(spy_ret.rolling(20).mean(), label='20-day Rolling Mean', alpha=0.8)
ax1.plot(spy_ret.ewm(span=20).mean(), label='EWM Mean (span=20)', alpha=0.8)
ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax1.set_title('Rolling Mean vs Exponentially Weighted Mean')
ax1.legend()
# EWM Volatility vs Rolling Volatility
ax2 = axes[1]
rolling_vol_60 = spy_ret.rolling(60).std() * np.sqrt(252)
ewm_vol_60 = spy_ret.ewm(span=60).std() * np.sqrt(252)
ax2.plot(rolling_vol_60, label='60-day Rolling Vol', alpha=0.8)
ax2.plot(ewm_vol_60, label='EWM Vol (span=60)', alpha=0.8)
ax2.set_title('Rolling Volatility vs Exponentially Weighted Volatility')
ax2.set_ylabel('Annualized Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()
plt.tight_layout()
plt.show()
print('EWM reacts faster to recent changes!')
Exercise 3.3: Rolling Correlation (Guided)
Your Task: Build a function that calculates rolling correlation between two assets.
Fill in the blanks:
Click to reveal solution
def rolling_correlation(series1: pd.Series, series2: pd.Series,
window: int = 60) -> pd.Series:
"""
Calculate rolling correlation between two series.
"""
rolling_corr = series1.rolling(window).corr(series2)
return rolling_corr
def analyze_rolling_correlation(series1: pd.Series, series2: pd.Series,
name1: str, name2: str,
window: int = 60) -> dict:
"""
Analyze rolling correlation with summary statistics.
"""
rolling_corr = rolling_correlation(series1, series2, window)
return {
'overall': series1.corr(series2),
'rolling_mean': rolling_corr.mean(),
'rolling_min': rolling_corr.min(),
'rolling_max': rolling_corr.max()
}
# Test
stats = analyze_rolling_correlation(returns['SPY'], returns['AAPL'], 'SPY', 'AAPL')
for key, value in stats.items():
print(f"{key}: {value:.3f}")
Section 3.4: Volatility Modeling
Volatility clustering is one of the most important stylized facts in finance. This section introduces volatility modeling.
In this section, you will learn: - What volatility clustering means - Simple volatility forecasting methods - Introduction to GARCH concepts
3.4.1 Visualizing Volatility Clustering
# Visualize volatility clustering
spy_ret = returns['SPY']
fig, axes = plt.subplots(3, 1, figsize=(14, 12))
# Returns
ax1 = axes[0]
ax1.plot(spy_ret)
ax1.axhline(y=0, color='red', linestyle='--', alpha=0.5)
ax1.set_title('SPY Returns')
ax1.set_ylabel('Return')
# Absolute returns (volatility proxy)
ax2 = axes[1]
ax2.plot(spy_ret.abs(), alpha=0.7)
ax2.plot(spy_ret.abs().rolling(20).mean(), color='red', lw=2, label='20-day MA')
ax2.set_title('Absolute Returns (Volatility Proxy)')
ax2.set_ylabel('|Return|')
ax2.legend()
# Squared returns
ax3 = axes[2]
ax3.plot(spy_ret**2, alpha=0.7)
ax3.plot((spy_ret**2).rolling(20).mean(), color='red', lw=2, label='20-day MA')
ax3.set_title('Squared Returns (Variance Proxy)')
ax3.set_ylabel('Return²')
ax3.legend()
plt.tight_layout()
plt.show()
print('Notice the clustering: periods of high volatility are followed by high volatility!')
3.4.2 Simple Volatility Forecasting
Before jumping to complex models, let's try simple approaches:
- Historical volatility: Use past realized volatility
- EWMA: Exponentially weighted moving average
- Simple persistence: Tomorrow's vol = today's vol
# Calculate different volatility estimates
window = 20
# Historical (rolling) volatility
hist_vol = spy_ret.rolling(window).std() * np.sqrt(252)
# EWMA volatility (RiskMetrics style)
ewma_vol = spy_ret.ewm(span=window).std() * np.sqrt(252)
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(hist_vol, label=f'{window}-day Historical Vol', alpha=0.8)
ax.plot(ewma_vol, label=f'EWMA Vol (span={window})', alpha=0.8)
ax.set_title('Volatility Estimates Comparison')
ax.set_ylabel('Annualized Volatility')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend()
plt.tight_layout()
plt.show()
print('EWMA responds faster to changes due to exponential weighting.')
3.4.3 Introduction to GARCH
GARCH (Generalized Autoregressive Conditional Heteroskedasticity) is the standard model for volatility.
GARCH(1,1) Model:
σ²_t = ω + α × ε²_{t-1} + β × σ²_{t-1}
Where: - ω = long-run variance weight - α = weight on recent shock (yesterday's squared return) - β = weight on recent variance (yesterday's variance)
Interpretation: - High α: Volatility reacts strongly to shocks - High β: Volatility is persistent - α + β close to 1: High persistence
def simple_garch_variance(returns: pd.Series, omega: float = 0.00001,
alpha: float = 0.1, beta: float = 0.85) -> pd.Series:
"""
Simple GARCH(1,1) variance calculation.
sigma2_t = omega + alpha * epsilon2_{t-1} + beta * sigma2_{t-1}
Args:
returns: Return series
omega: Long-run variance weight
alpha: Weight on recent shock
beta: Weight on recent variance
Returns:
Variance series
"""
n = len(returns)
sigma2 = np.zeros(n)
# Initialize with sample variance
sigma2[0] = returns.var()
for t in range(1, n):
sigma2[t] = omega + alpha * returns.iloc[t-1]**2 + beta * sigma2[t-1]
return pd.Series(sigma2, index=returns.index)
# Calculate GARCH variance
garch_var = simple_garch_variance(spy_ret)
garch_vol = np.sqrt(garch_var) * np.sqrt(252) # Annualized
# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(hist_vol, label='Historical Vol (20-day)', alpha=0.8)
ax.plot(ewma_vol, label='EWMA Vol', alpha=0.8)
ax.plot(garch_vol, label='GARCH-like Vol', alpha=0.8)
ax.set_title('Volatility Model Comparison')
ax.set_ylabel('Annualized Volatility')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend()
plt.tight_layout()
plt.show()
print('GARCH captures volatility clustering and adapts to changing market conditions.')
Exercise 3.4: Volatility Forecasting Comparison (Open-ended)
Your Task:
Build a function that: - Calculates historical and EWMA volatility forecasts - Evaluates forecast accuracy using RMSE - Returns a comparison report
Your implementation:
Click to reveal solution
def compare_volatility_forecasts(returns: pd.Series, window: int = 20) -> dict:
"""
Compare historical and EWMA volatility forecasts.
Args:
returns: Return series
window: Window size for historical vol
Returns:
Dictionary with comparison metrics
"""
# Calculate volatility estimates
hist_vol = returns.rolling(window).std()
ewma_vol = returns.ewm(span=window).std()
# Shift forecasts (use yesterday's estimate to predict today)
hist_forecast = hist_vol.shift(1)
ewma_forecast = ewma_vol.shift(1)
# Realized volatility proxy: absolute return
realized = returns.abs()
# Calculate RMSE
def rmse(forecast, realized):
diff = (forecast - realized).dropna()
return np.sqrt((diff ** 2).mean())
hist_rmse = rmse(hist_forecast, realized)
ewma_rmse = rmse(ewma_forecast, realized)
# Calculate correlation with realized
hist_corr = hist_forecast.corr(realized)
ewma_corr = ewma_forecast.corr(realized)
return {
'hist_rmse': hist_rmse,
'ewma_rmse': ewma_rmse,
'hist_corr': hist_corr,
'ewma_corr': ewma_corr,
'better_model': 'EWMA' if ewma_rmse < hist_rmse else 'Historical'
}
# Test
comparison = compare_volatility_forecasts(returns['SPY'])
print("=== Volatility Forecast Comparison ===")
print(f"Historical RMSE: {comparison['hist_rmse']:.6f}")
print(f"EWMA RMSE: {comparison['ewma_rmse']:.6f}")
print(f"Historical Correlation: {comparison['hist_corr']:.4f}")
print(f"EWMA Correlation: {comparison['ewma_corr']:.4f}")
print(f"Better Model: {comparison['better_model']}")
Exercise 3.5: Crisis Detection Tool (Open-ended)
Your Task:
Build a tool that: - Detects high-volatility regimes using rolling volatility - Identifies crisis periods (volatility > 2 standard deviations above mean) - Returns dates and duration of crisis periods
Your implementation:
Click to reveal solution
class CrisisDetector:
"""
Detect high-volatility crisis periods.
"""
def __init__(self, returns: pd.Series, vol_window: int = 20, threshold_std: float = 2.0):
self.returns = returns
self.vol_window = vol_window
self.threshold_std = threshold_std
# Calculate rolling volatility
self.rolling_vol = returns.rolling(vol_window).std() * np.sqrt(252)
# Calculate threshold
self.vol_mean = self.rolling_vol.mean()
self.vol_std = self.rolling_vol.std()
self.threshold = self.vol_mean + threshold_std * self.vol_std
def detect_crises(self) -> pd.DataFrame:
"""Identify crisis periods."""
is_crisis = self.rolling_vol > self.threshold
# Find crisis start and end dates
crisis_changes = is_crisis.astype(int).diff()
starts = crisis_changes[crisis_changes == 1].index
ends = crisis_changes[crisis_changes == -1].index
crises = []
for i, start in enumerate(starts):
# Find corresponding end
possible_ends = ends[ends > start]
if len(possible_ends) > 0:
end = possible_ends[0]
else:
end = self.returns.index[-1]
duration = (end - start).days
max_vol = self.rolling_vol[start:end].max()
crises.append({
'Start': start,
'End': end,
'Duration (days)': duration,
'Max Vol': max_vol
})
return pd.DataFrame(crises)
def plot(self):
"""Plot volatility with crisis highlighting."""
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(self.rolling_vol, label='Rolling Vol')
ax.axhline(y=self.threshold, color='red', linestyle='--',
label=f'Crisis Threshold ({self.threshold:.1%})')
# Highlight crisis periods
is_crisis = self.rolling_vol > self.threshold
ax.fill_between(self.rolling_vol.index, 0, self.rolling_vol.max(),
where=is_crisis, alpha=0.3, color='red', label='Crisis Period')
ax.set_title('Volatility with Crisis Detection')
ax.set_ylabel('Annualized Volatility')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend()
plt.tight_layout()
plt.show()
# Test
detector = CrisisDetector(returns['SPY'])
crises = detector.detect_crises()
print("=== Crisis Periods Detected ===")
print(crises)
detector.plot()
Exercise 3.6: Time Series Analysis Dashboard (Open-ended)
Your Task:
Build a comprehensive time series analysis class that: - Tests for stationarity (prices and returns) - Calculates autocorrelation statistics - Computes rolling statistics (mean, vol, Sharpe) - Generates a complete analysis report
Your implementation:
Click to reveal solution
class TimeSeriesAnalyzer:
"""
Comprehensive time series analysis tool.
"""
def __init__(self, prices: pd.Series, name: str = 'Asset'):
self.prices = prices
self.returns = prices.pct_change().dropna()
self.name = name
def stationarity_test(self) -> dict:
"""Test stationarity for prices and returns."""
price_adf = adfuller(self.prices.dropna())
return_adf = adfuller(self.returns.dropna())
return {
'prices_pvalue': price_adf[1],
'prices_stationary': price_adf[1] < 0.05,
'returns_pvalue': return_adf[1],
'returns_stationary': return_adf[1] < 0.05
}
def autocorrelation_test(self, lags: int = 10) -> dict:
"""Test autocorrelation in returns and squared returns."""
lb_returns = acorr_ljungbox(self.returns.dropna(), lags=lags, return_df=True)
lb_squared = acorr_ljungbox(self.returns.dropna()**2, lags=lags, return_df=True)
return {
'returns_min_pvalue': lb_returns['lb_pvalue'].min(),
'returns_autocorrelated': lb_returns['lb_pvalue'].min() < 0.05,
'squared_min_pvalue': lb_squared['lb_pvalue'].min(),
'squared_autocorrelated': lb_squared['lb_pvalue'].min() < 0.05
}
def rolling_stats(self, window: int = 60) -> dict:
"""Calculate current rolling statistics."""
rolling_mean = self.returns.rolling(window).mean().iloc[-1] * 252
rolling_vol = self.returns.rolling(window).std().iloc[-1] * np.sqrt(252)
rolling_sharpe = rolling_mean / rolling_vol if rolling_vol > 0 else 0
return {
'rolling_mean_annual': rolling_mean,
'rolling_vol_annual': rolling_vol,
'rolling_sharpe': rolling_sharpe
}
def generate_report(self) -> str:
"""Generate comprehensive analysis report."""
stat = self.stationarity_test()
acf = self.autocorrelation_test()
rolling = self.rolling_stats()
report = f"""
========================================
TIME SERIES ANALYSIS: {self.name}
========================================
STATIONARITY:
Prices: {'Stationary' if stat['prices_stationary'] else 'Non-Stationary'} (p={stat['prices_pvalue']:.4f})
Returns: {'Stationary' if stat['returns_stationary'] else 'Non-Stationary'} (p={stat['returns_pvalue']:.4f})
AUTOCORRELATION:
Returns: {'Yes' if acf['returns_autocorrelated'] else 'No'} (p={acf['returns_min_pvalue']:.4f})
Squared Returns: {'Yes' if acf['squared_autocorrelated'] else 'No'} (p={acf['squared_min_pvalue']:.4f})
ROLLING STATS (60-day):
Annualized Return: {rolling['rolling_mean_annual']:.2%}
Annualized Vol: {rolling['rolling_vol_annual']:.2%}
Sharpe Ratio: {rolling['rolling_sharpe']:.3f}
========================================
"""
return report
# Test
analyzer = TimeSeriesAnalyzer(prices['MSFT'], 'MSFT')
print(analyzer.generate_report())
Module Project: Complete Time Series Analysis Report
Create a comprehensive time series analysis for MSFT.
Your Challenge:
Analyze MSFT and produce a report that includes: 1. Stationarity test (prices vs returns) 2. Autocorrelation analysis (returns and squared returns) 3. Rolling statistics (60-day mean, volatility, Sharpe) 4. Volatility forecast comparison (Historical vs EWMA)
# YOUR CODE HERE - Create a comprehensive time series analysis
Click to reveal solution
# Complete Time Series Analysis for MSFT
msft_prices = prices['MSFT']
msft_returns = returns['MSFT']
print('='*60)
print('MSFT TIME SERIES ANALYSIS REPORT')
print('='*60)
# 1. Stationarity Tests
print('\n--- 1. STATIONARITY ANALYSIS ---\n')
price_adf = adfuller(msft_prices.dropna())
return_adf = adfuller(msft_returns.dropna())
print(f'Prices ADF p-value: {price_adf[1]:.4f} ({"Stationary" if price_adf[1] < 0.05 else "Non-Stationary"})')
print(f'Returns ADF p-value: {return_adf[1]:.4f} ({"Stationary" if return_adf[1] < 0.05 else "Non-Stationary"})')
# 2. Autocorrelation Analysis
print('\n--- 2. AUTOCORRELATION ANALYSIS ---\n')
print('Returns Autocorrelation (lags 1-5):')
for lag in range(1, 6):
acf_val = msft_returns.corr(msft_returns.shift(lag))
print(f' Lag {lag}: {acf_val:.4f}')
print('\nSquared Returns Autocorrelation (lags 1-5):')
msft_sq = msft_returns ** 2
for lag in range(1, 6):
acf_val = msft_sq.corr(msft_sq.shift(lag))
print(f' Lag {lag}: {acf_val:.4f}')
# Ljung-Box test
lb_returns = acorr_ljungbox(msft_returns.dropna(), lags=10, return_df=True)
lb_squared = acorr_ljungbox(msft_sq.dropna(), lags=10, return_df=True)
print(f'\nLjung-Box Test (lag 10):')
print(f' Returns p-value: {lb_returns["lb_pvalue"].iloc[-1]:.4f}')
print(f' Squared Returns p-value: {lb_squared["lb_pvalue"].iloc[-1]:.4f}')
# 3. Rolling Statistics
print('\n--- 3. ROLLING STATISTICS (60-day) ---\n')
rolling_mean = msft_returns.rolling(60).mean() * 252
rolling_vol = msft_returns.rolling(60).std() * np.sqrt(252)
rolling_sharpe = rolling_mean / rolling_vol
print(f'Current 60-day Mean Return (Ann): {rolling_mean.iloc[-1]:.2%}')
print(f'Current 60-day Volatility (Ann): {rolling_vol.iloc[-1]:.2%}')
print(f'Current 60-day Sharpe Ratio: {rolling_sharpe.iloc[-1]:.3f}')
# 4. Volatility Comparison
print('\n--- 4. VOLATILITY FORECAST COMPARISON ---\n')
hist_vol = msft_returns.rolling(20).std() * np.sqrt(252)
ewma_vol = msft_returns.ewm(span=20).std() * np.sqrt(252)
realized_var = msft_returns ** 2
hist_forecast = (hist_vol.shift(1) / np.sqrt(252)) ** 2
ewma_forecast = (ewma_vol.shift(1) / np.sqrt(252)) ** 2
hist_rmse = np.sqrt(((hist_forecast - realized_var).dropna() ** 2).mean())
ewma_rmse = np.sqrt(((ewma_forecast - realized_var).dropna() ** 2).mean())
print(f'Historical Vol RMSE: {hist_rmse:.6f}')
print(f'EWMA Vol RMSE: {ewma_rmse:.6f}')
print(f'\nBetter Model: {"EWMA" if ewma_rmse < hist_rmse else "Historical"}')
print('\n' + '='*60)
print('END OF REPORT')
print('='*60)
Key Takeaways
What You Learned
1. Stationarity
- Prices are non-stationary: They trend over time (random walk)
- Returns are stationary: They fluctuate around a constant mean
- Always work with returns for statistical modeling
- Use the ADF test to formally test stationarity
2. Autocorrelation
- Returns have little autocorrelation: Consistent with market efficiency
- Squared returns have strong autocorrelation: Volatility clustering
- Use ACF/PACF plots to visualize correlation structure
- Ljung-Box test for formal significance testing
3. Moving Statistics
- Rolling: Fixed window, good for trends
- Expanding: Cumulative, good for overall statistics
- EWM: Weighted toward recent, good for volatility
- Window size is a bias-variance tradeoff
4. Volatility Modeling
- Volatility clusters: Big moves follow big moves
- EWMA reacts faster than simple rolling windows
- GARCH is the standard model for volatility
- Volatility IS predictable (unlike returns)
Key Formulas
| Concept | Formula/Test |
|---|---|
| Stationarity | ADF test (p < 0.05 = stationary) |
| Autocorrelation | ρ_k = Cov(X_t, X_{t-k}) / Var(X_t) |
| Rolling Mean | mean over [t-w, t] window |
| Rolling Vol | σ_t = std([t-w, t]) × √252 |
| EWMA Vol | σ²_t = λσ²_{t-1} + (1-λ)r²_{t-1} |
| GARCH(1,1) | σ²_t = ω + αε²_{t-1} + βσ²_{t-1} |
Coming Up Next
In Part 2: Portfolio Theory, we'll learn: - Modern Portfolio Theory (Markowitz) - Mean-Variance Optimization - Efficient Frontier - Capital Asset Pricing Model (CAPM)
Congratulations on completing Part 1: Statistical Foundations! You now have the statistical toolkit for quantitative finance.
Module 4: Modern Portfolio Theory
Course 3: Quantitative Finance
Part 2: Portfolio Theory
Learning Objectives
By the end of this module, you will be able to:
- Calculate portfolio returns and risk using matrix operations
- Understand and quantify the diversification effect
- Analyze the risk-return tradeoff
- Build and analyze two-asset portfolios
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 3: Time Series Analysis |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download data for a diverse portfolio
tickers = ['AAPL', 'MSFT', 'JNJ', 'XOM', 'GLD'] # Tech, Healthcare, Energy, Gold
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
print("Downloading portfolio data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()
# Calculate annualized statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
n_assets = len(tickers)
print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")
Section 4.1: Portfolio Returns & Risk
In 1952, Harry Markowitz revolutionized finance with Modern Portfolio Theory (MPT). The key insight: don't put all your eggs in one basket.
In this section, you will learn: - How to calculate portfolio returns as weighted averages - Why portfolio risk is NOT a weighted average - The role of the covariance matrix
4.1.1 Portfolio Return Formula
The portfolio return is the weighted average of individual asset returns:
$$R_p = \sum_{i=1}^{n} w_i R_i = w_1 R_1 + w_2 R_2 + ... + w_n R_n$$
Where: - $R_p$ = Portfolio return - $w_i$ = Weight of asset $i$ - $R_i$ = Return of asset $i$ - $\sum w_i = 1$ (weights sum to 100%)
# Display individual asset statistics
print("Individual Asset Statistics (Annualized)")
print("=" * 50)
stats_df = pd.DataFrame({
'Expected Return': annual_returns,
'Volatility': annual_volatility,
'Sharpe (rf=0)': annual_returns / annual_volatility
})
print(stats_df.round(4))
# Create an equal-weighted portfolio
equal_weights = np.array([1/n_assets] * n_assets)
# Portfolio expected return (weighted average)
portfolio_return = np.dot(equal_weights, annual_returns)
print(f"Equal-weighted portfolio: {dict(zip(tickers, equal_weights.round(4)))}")
print(f"\nPortfolio Expected Return: {portfolio_return:.4f} ({portfolio_return*100:.2f}%)")
# Naive expectation of volatility (weighted average - WRONG!)
naive_volatility = np.dot(equal_weights, annual_volatility)
print(f"Naive Portfolio Volatility (weighted avg): {naive_volatility:.4f} ({naive_volatility*100:.2f}%)")
4.1.2 Portfolio Risk Formula
Portfolio risk is NOT a simple weighted average! It depends on how assets move together (covariance):
$$\sigma_p^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} w_i w_j \sigma_{ij}$$
Or in matrix notation:
$$\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$$
Where: - $\sigma_p^2$ = Portfolio variance - $\Sigma$ = Covariance matrix - $\mathbf{w}$ = Weight vector
# Display the covariance matrix
print("Annualized Covariance Matrix")
print("=" * 60)
print(cov_matrix.round(4))
# Calculate TRUE portfolio volatility using matrix math
portfolio_variance = np.dot(equal_weights.T, np.dot(cov_matrix, equal_weights))
portfolio_volatility = np.sqrt(portfolio_variance)
print("Portfolio Risk Calculation")
print("=" * 50)
print(f"True Portfolio Volatility: {portfolio_volatility:.4f} ({portfolio_volatility*100:.2f}%)")
print(f"Naive (weighted avg): {naive_volatility:.4f} ({naive_volatility*100:.2f}%)")
print(f"\nRisk Reduction from Diversification: {(1 - portfolio_volatility/naive_volatility)*100:.2f}%")
Exercise 4.1: Calculate Custom Portfolio Statistics (Guided)
Your Task: Create a custom portfolio with weights [30%, 25%, 20%, 15%, 10%] and calculate its return and volatility.
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_portfolio_stats(weights: np.ndarray,
expected_returns: pd.Series,
cov_matrix: pd.DataFrame) -> dict:
"""Calculate portfolio return and volatility."""
port_return = np.dot(weights, expected_returns)
port_variance = np.dot(weights.T, np.dot(cov_matrix, weights))
port_volatility = np.sqrt(port_variance)
return {
'return': port_return,
'volatility': port_volatility,
'sharpe': port_return / port_volatility
}
custom_weights = np.array([0.30, 0.25, 0.20, 0.15, 0.10])
stats = calculate_portfolio_stats(custom_weights, annual_returns, cov_matrix)
print(f"Return: {stats['return']*100:.2f}%, Volatility: {stats['volatility']*100:.2f}%")
Section 4.2: Diversification
Diversification is the "only free lunch in finance" - you can reduce risk without reducing expected return.
In this section, you will learn: - The role of correlation in diversification - How risk decreases as we add more assets - The difference between systematic and idiosyncratic risk
4.2.1 The Role of Correlation
Diversification benefits depend on correlation between assets:
| Correlation | Diversification Benefit |
|---|---|
| ρ = +1 | None (assets move perfectly together) |
| ρ = 0 | Good (assets move independently) |
| ρ = -1 | Perfect (can eliminate all risk!) |
# Calculate and visualize correlation matrix
corr_matrix = returns.corr()
fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(corr_matrix, cmap='RdYlGn', vmin=-1, vmax=1)
ax.set_xticks(range(len(tickers)))
ax.set_yticks(range(len(tickers)))
ax.set_xticklabels(tickers)
ax.set_yticklabels(tickers)
for i in range(len(tickers)):
for j in range(len(tickers)):
ax.text(j, i, f'{corr_matrix.iloc[i, j]:.2f}',
ha='center', va='center', fontsize=12)
plt.colorbar(im, label='Correlation')
plt.title('Asset Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
4.2.2 Diversification by Number of Assets
# Simulate diversification effect
np.random.seed(42)
n_simulations = 1000
asset_counts = range(1, n_assets + 1)
avg_volatilities = []
for n in asset_counts:
volatilities = []
for _ in range(n_simulations):
selected = np.random.choice(n_assets, n, replace=False)
w = np.zeros(n_assets)
w[selected] = 1/n
vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
volatilities.append(vol)
avg_volatilities.append(np.mean(volatilities))
plt.figure(figsize=(10, 6))
plt.plot(asset_counts, avg_volatilities, 'bo-', linewidth=2, markersize=10)
plt.axhline(y=avg_volatilities[-1], color='g', linestyle='--', alpha=0.7,
label=f'Fully diversified: {avg_volatilities[-1]*100:.1f}%')
plt.axhline(y=annual_volatility.mean(), color='r', linestyle='--', alpha=0.7,
label=f'Avg single asset: {annual_volatility.mean()*100:.1f}%')
plt.xlabel('Number of Assets', fontsize=12)
plt.ylabel('Average Portfolio Volatility', fontsize=12)
plt.title('Diversification Effect', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
4.2.3 Systematic vs. Idiosyncratic Risk
Diversification can only eliminate idiosyncratic (unsystematic) risk—the risk specific to individual assets.
Systematic risk (market risk) cannot be diversified away because it affects all assets.
$$\text{Total Risk} = \text{Systematic Risk} + \text{Idiosyncratic Risk}$$
Exercise 4.2: Find Best Diversification Pairs (Guided)
Your Task: Loop through all asset pairs and find the 50/50 portfolio with the lowest volatility.
Fill in the blanks:
Click to reveal solution
def find_best_pair(returns: pd.DataFrame, tickers: list) -> tuple:
"""Find the two-asset 50/50 portfolio with lowest volatility."""
best_pair = None
best_volatility = float('inf')
for i in range(len(tickers)):
for j in range(i+1, len(tickers)):
pair_returns = returns[[tickers[i], tickers[j]]]
pair_cov = pair_returns.cov() * 252
weights = np.array([0.5, 0.5])
port_vol = np.sqrt(np.dot(weights.T, np.dot(pair_cov, weights)))
if port_vol < best_volatility:
best_volatility = port_vol
best_pair = (tickers[i], tickers[j])
return best_pair, best_volatility
pair, vol = find_best_pair(returns, tickers)
print(f"Best pair: {pair}, Volatility: {vol*100:.2f}%")
Exercise 4.3: Correlation Impact Analysis (Open-ended)
Your Task:
Build a function that: - Takes two assets and simulates different correlation values (-1 to +1) - Calculates the minimum achievable portfolio volatility for each correlation - Returns a DataFrame showing the relationship between correlation and minimum risk
Your implementation:
Click to reveal solution
def correlation_impact_analysis(vol_a: float, vol_b: float,
ret_a: float, ret_b: float) -> pd.DataFrame:
"""Analyze how correlation affects minimum portfolio risk."""
correlations = np.linspace(-1, 1, 21)
results = []
for corr in correlations:
# Analytical minimum variance weight for asset A
numerator = vol_b**2 - vol_a * vol_b * corr
denominator = vol_a**2 + vol_b**2 - 2 * vol_a * vol_b * corr
if denominator > 0:
w_a = numerator / denominator
w_a = max(0, min(1, w_a)) # Bound to [0, 1]
else:
w_a = 0.5
w_b = 1 - w_a
# Calculate portfolio volatility
var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 +
2 * w_a * w_b * vol_a * vol_b * corr)
min_vol = np.sqrt(max(var, 0))
results.append({
'Correlation': corr,
'Weight_A': w_a,
'Min_Volatility': min_vol,
'Risk_Reduction': (1 - min_vol / ((vol_a + vol_b) / 2)) * 100
})
return pd.DataFrame(results)
# Test with AAPL and GLD
analysis = correlation_impact_analysis(
annual_volatility['AAPL'], annual_volatility['GLD'],
annual_returns['AAPL'], annual_returns['GLD']
)
print(analysis.to_string(index=False))
Section 4.3: The Risk-Return Tradeoff
In finance, there's a fundamental relationship: higher expected returns require taking more risk.
In this section, you will learn: - How to visualize the risk-return space - The concept of dominated portfolios - Finding optimal portfolios through random sampling
4.3.1 Visualizing Risk-Return Space
# Generate random portfolios
np.random.seed(42)
n_portfolios = 5000
portfolio_returns_list = []
portfolio_volatilities_list = []
portfolio_sharpes_list = []
portfolio_weights_list = []
for _ in range(n_portfolios):
weights = np.random.random(n_assets)
weights = weights / weights.sum()
ret = np.dot(weights, annual_returns)
vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
sharpe = ret / vol
portfolio_returns_list.append(ret)
portfolio_volatilities_list.append(vol)
portfolio_sharpes_list.append(sharpe)
portfolio_weights_list.append(weights)
# Plot risk-return space
plt.figure(figsize=(12, 8))
scatter = plt.scatter(portfolio_volatilities_list, portfolio_returns_list,
c=portfolio_sharpes_list, cmap='viridis', alpha=0.5, s=10)
plt.colorbar(scatter, label='Sharpe Ratio')
# Individual assets
for ticker in tickers:
plt.scatter(annual_volatility[ticker], annual_returns[ticker],
s=200, marker='*', edgecolors='black', linewidth=2, zorder=5)
plt.annotate(ticker, (annual_volatility[ticker], annual_returns[ticker]),
xytext=(10, 5), textcoords='offset points', fontsize=12, fontweight='bold')
# Equal-weighted portfolio
plt.scatter(portfolio_volatility, portfolio_return,
s=300, marker='D', c='red', edgecolors='black', linewidth=2,
label='Equal-Weighted', zorder=5)
plt.xlabel('Volatility (Risk)', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title('Risk-Return Space: Individual Assets and Random Portfolios', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Find best portfolios from random sampling
max_sharpe_idx = np.argmax(portfolio_sharpes_list)
min_vol_idx = np.argmin(portfolio_volatilities_list)
print("Notable Portfolios from Random Sampling")
print("=" * 60)
print("\nMaximum Sharpe Ratio Portfolio:")
print(f" Return: {portfolio_returns_list[max_sharpe_idx]*100:.2f}%")
print(f" Volatility: {portfolio_volatilities_list[max_sharpe_idx]*100:.2f}%")
print(f" Sharpe: {portfolio_sharpes_list[max_sharpe_idx]:.4f}")
print("\nMinimum Volatility Portfolio:")
print(f" Return: {portfolio_returns_list[min_vol_idx]*100:.2f}%")
print(f" Volatility: {portfolio_volatilities_list[min_vol_idx]*100:.2f}%")
print(f" Sharpe: {portfolio_sharpes_list[min_vol_idx]:.4f}")
Exercise 4.4: Risk Contribution Analysis (Guided)
Your Task: Calculate the marginal and percentage risk contribution of each asset in a portfolio.
Fill in the blanks:
Click to reveal solution
def calculate_risk_contribution(weights: np.ndarray,
cov_matrix: pd.DataFrame) -> pd.DataFrame:
"""Calculate risk contribution of each asset."""
port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
mcr = np.dot(cov_matrix, weights) / port_vol
ccr = weights * mcr
pcr = ccr / port_vol * 100
return pd.DataFrame({
'Weight': weights,
'Marginal_Risk': mcr,
'Component_Risk': ccr,
'Pct_of_Risk': pcr
}, index=cov_matrix.columns)
risk_contrib = calculate_risk_contribution(equal_weights, cov_matrix)
print(risk_contrib)
Section 4.4: Two-Asset Portfolio Analysis
Before tackling complex multi-asset portfolios, let's build intuition with just two assets.
In this section, you will learn: - The two-asset portfolio formulas - How to find the minimum variance portfolio analytically - The effect of correlation on the portfolio frontier
4.4.1 Two-Asset Portfolio Formulas
For a portfolio of assets A and B:
Return: $R_p = w_A R_A + (1-w_A) R_B$
Variance: $\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2 w_A w_B \sigma_A \sigma_B \rho_{AB}$
Minimum Variance Weight: $$w_A^* = \frac{\sigma_B^2 - \sigma_A \sigma_B \rho_{AB}}{\sigma_A^2 + \sigma_B^2 - 2\sigma_A \sigma_B \rho_{AB}}$$
# Two-asset analysis: AAPL vs GLD
asset_a, asset_b = 'AAPL', 'GLD'
ret_a = annual_returns[asset_a]
ret_b = annual_returns[asset_b]
vol_a = annual_volatility[asset_a]
vol_b = annual_volatility[asset_b]
corr_ab = corr_matrix.loc[asset_a, asset_b]
print(f"Two-Asset Analysis: {asset_a} vs {asset_b}")
print("=" * 50)
print(f"\n{asset_a}: Return={ret_a*100:.2f}%, Vol={vol_a*100:.2f}%")
print(f"{asset_b}: Return={ret_b*100:.2f}%, Vol={vol_b*100:.2f}%")
print(f"Correlation: {corr_ab:.4f}")
# Generate portfolios across weight combinations
weights_a = np.linspace(0, 1, 101)
port_returns_2asset = []
port_vols_2asset = []
for w_a in weights_a:
w_b = 1 - w_a
ret = w_a * ret_a + w_b * ret_b
var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 +
2 * w_a * w_b * vol_a * vol_b * corr_ab)
port_returns_2asset.append(ret)
port_vols_2asset.append(np.sqrt(var))
# Find minimum variance
min_var_idx = np.argmin(port_vols_2asset)
# Plot two-asset frontier
plt.figure(figsize=(12, 8))
plt.plot(port_vols_2asset, port_returns_2asset, 'b-', linewidth=3, label='Portfolio Combinations')
plt.scatter(vol_a, ret_a, s=300, marker='*', c='red', edgecolors='black',
linewidth=2, zorder=5, label=asset_a)
plt.scatter(vol_b, ret_b, s=300, marker='*', c='gold', edgecolors='black',
linewidth=2, zorder=5, label=asset_b)
plt.scatter(port_vols_2asset[min_var_idx], port_returns_2asset[min_var_idx],
s=200, marker='D', c='green', edgecolors='black', linewidth=2,
zorder=5, label=f'Min Var ({weights_a[min_var_idx]*100:.0f}% {asset_a})')
plt.xlabel('Volatility (Risk)', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title(f'Two-Asset Portfolio: {asset_a} and {asset_b}\n(Correlation: {corr_ab:.3f})',
fontsize=14, fontweight='bold')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Exercise 4.5: Analytical Minimum Variance (Open-ended)
Your Task:
Build a function that: - Takes two asset volatilities and their correlation - Calculates the optimal weight for minimum variance using the analytical formula - Returns the optimal weights and the resulting portfolio volatility
Your implementation:
Click to reveal solution
def analytical_min_variance(vol_a: float, vol_b: float,
correlation: float) -> dict:
"""
Calculate minimum variance portfolio weights analytically.
Args:
vol_a: Volatility of asset A
vol_b: Volatility of asset B
correlation: Correlation between A and B
Returns:
Dictionary with optimal weights and portfolio volatility
"""
numerator = vol_b**2 - vol_a * vol_b * correlation
denominator = vol_a**2 + vol_b**2 - 2 * vol_a * vol_b * correlation
if denominator == 0:
w_a = 0.5
else:
w_a = numerator / denominator
w_b = 1 - w_a
# Calculate portfolio volatility
port_var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 +
2 * w_a * w_b * vol_a * vol_b * correlation)
port_vol = np.sqrt(max(port_var, 0))
return {
'weight_a': w_a,
'weight_b': w_b,
'portfolio_volatility': port_vol
}
# Test
result = analytical_min_variance(vol_a, vol_b, corr_ab)
print(f"Optimal weight A: {result['weight_a']*100:.2f}%")
print(f"Optimal weight B: {result['weight_b']*100:.2f}%")
print(f"Min portfolio vol: {result['portfolio_volatility']*100:.2f}%")
Exercise 4.6: Complete Portfolio Analyzer (Open-ended)
Your Task:
Build a PortfolioAnalyzer class that:
- Takes a list of tickers and downloads data
- Calculates all individual asset statistics
- Computes correlation and covariance matrices
- Generates random portfolios and finds the best ones
- Provides a summary method that displays all key metrics
Your implementation:
Click to reveal solution
class PortfolioAnalyzer:
"""Comprehensive portfolio analysis tool."""
def __init__(self, tickers: list, years: int = 5):
self.tickers = tickers
self.n_assets = len(tickers)
self._load_data(years)
self._calculate_statistics()
def _load_data(self, years: int):
"""Download and prepare price data."""
end = datetime.now()
start = end - timedelta(days=years*365)
data = yf.download(self.tickers, start=start, end=end, progress=False)
if isinstance(data.columns, pd.MultiIndex):
self.prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
self.prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
self.prices.columns = [str(c) for c in self.prices.columns]
self.returns = self.prices.pct_change().dropna()
def _calculate_statistics(self):
"""Calculate all statistics."""
self.annual_returns = self.returns.mean() * 252
self.annual_volatility = self.returns.std() * np.sqrt(252)
self.cov_matrix = self.returns.cov() * 252
self.corr_matrix = self.returns.corr()
def portfolio_stats(self, weights: np.ndarray) -> dict:
"""Calculate portfolio statistics."""
ret = np.dot(weights, self.annual_returns)
vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
return {'return': ret, 'volatility': vol, 'sharpe': ret / vol}
def find_optimal_portfolios(self, n_samples: int = 5000) -> dict:
"""Find optimal portfolios through random sampling."""
np.random.seed(42)
best_sharpe = {'sharpe': -np.inf}
min_vol = {'volatility': np.inf}
for _ in range(n_samples):
w = np.random.random(self.n_assets)
w = w / w.sum()
stats = self.portfolio_stats(w)
stats['weights'] = w
if stats['sharpe'] > best_sharpe['sharpe']:
best_sharpe = stats.copy()
if stats['volatility'] < min_vol['volatility']:
min_vol = stats.copy()
return {'max_sharpe': best_sharpe, 'min_volatility': min_vol}
def summary(self):
"""Display comprehensive summary."""
print("=" * 60)
print("PORTFOLIO ANALYZER SUMMARY")
print("=" * 60)
print("\nAsset Statistics:")
stats_df = pd.DataFrame({
'Return': self.annual_returns,
'Volatility': self.annual_volatility,
'Sharpe': self.annual_returns / self.annual_volatility
})
print(stats_df.round(4))
optimal = self.find_optimal_portfolios()
print("\nOptimal Portfolios:")
for name, stats in optimal.items():
print(f"\n{name}:")
print(f" Return: {stats['return']*100:.2f}%")
print(f" Volatility: {stats['volatility']*100:.2f}%")
print(f" Sharpe: {stats['sharpe']:.4f}")
# Test
analyzer = PortfolioAnalyzer(['AAPL', 'MSFT', 'JNJ', 'GLD'])
analyzer.summary()
Module Project: Build Your Own Diversified Portfolio
Apply everything you've learned to construct and analyze a diversified portfolio.
Your Challenge:
Build a complete portfolio analysis that: 1. Creates a custom portfolio with your chosen weights 2. Calculates all risk and return metrics 3. Compares to individual assets and equal-weighted benchmark 4. Analyzes risk contribution by asset 5. Visualizes the results
# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Portfolio Analysis Project
# Step 1: Define custom portfolio
my_weights = np.array([0.35, 0.30, 0.15, 0.10, 0.10])
print("My Portfolio Allocation")
print("=" * 40)
for ticker, weight in zip(tickers, my_weights):
print(f" {ticker}: {weight*100:.1f}%")
# Step 2: Calculate statistics
my_return = np.dot(my_weights, annual_returns)
my_variance = np.dot(my_weights.T, np.dot(cov_matrix, my_weights))
my_volatility = np.sqrt(my_variance)
my_sharpe = my_return / my_volatility
print(f"\nPortfolio Statistics")
print(f" Return: {my_return*100:.2f}%")
print(f" Volatility: {my_volatility*100:.2f}%")
print(f" Sharpe: {my_sharpe:.4f}")
# Step 3: Compare to benchmarks
comparison = pd.DataFrame({
'My Portfolio': [my_return, my_volatility, my_sharpe],
'Equal-Weighted': [portfolio_return, portfolio_volatility, portfolio_return/portfolio_volatility]
}, index=['Return', 'Volatility', 'Sharpe'])
for ticker in tickers:
comparison[ticker] = [annual_returns[ticker],
annual_volatility[ticker],
annual_returns[ticker]/annual_volatility[ticker]]
print("\nComparison")
print(comparison.round(4).T)
# Step 4: Risk contribution
mcr = np.dot(cov_matrix, my_weights) / my_volatility
ccr = my_weights * mcr
pcr = ccr / my_volatility * 100
print("\nRisk Contribution")
risk_df = pd.DataFrame({
'Weight': my_weights,
'Pct_Risk': pcr
}, index=tickers)
print(risk_df.round(2))
# Step 5: Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].pie(my_weights, labels=tickers, autopct='%1.1f%%')
axes[0].set_title('Weight Allocation')
axes[1].pie(pcr, labels=tickers, autopct='%1.1f%%')
axes[1].set_title('Risk Contribution')
plt.tight_layout()
plt.show()
Key Takeaways
What You Learned
1. Portfolio Returns
- Portfolio return is the weighted average of asset returns
- Formula: $R_p = \sum w_i R_i$
2. Portfolio Risk
- Portfolio risk is NOT a weighted average
- Depends on covariances between assets
- Formula: $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$
3. Diversification
- Lower correlation = better diversification
- Can only eliminate idiosyncratic risk, not systematic risk
- The "only free lunch" in finance
4. Two-Asset Portfolios
- Analytical solutions exist for minimum variance
- Correlation determines the shape of the frontier
Key Formulas
| Metric | Formula |
|---|---|
| Portfolio Return | $R_p = \sum w_i R_i$ |
| Portfolio Variance | $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$ |
| Min Variance Weight | $w_A^* = \frac{\sigma_B^2 - \sigma_A\sigma_B\rho}{\sigma_A^2 + \sigma_B^2 - 2\sigma_A\sigma_B\rho}$ |
Coming Up Next
In Module 5: Mean-Variance Optimization, we'll learn how to find the optimal portfolio weights using mathematical optimization.
Congratulations on completing Module 4!
Module 5: Mean-Variance Optimization
Course 3: Quantitative Finance
Part 2: Portfolio Theory
Learning Objectives
By the end of this module, you will be able to:
- Formulate portfolio optimization as a mathematical problem
- Implement optimization using scipy
- Apply realistic constraints (long-only, position limits)
- Find optimal portfolios for different objectives
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 4: Modern Portfolio Theory |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download data for portfolio optimization
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JNJ', 'JPM', 'XOM', 'GLD']
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
print("Downloading portfolio data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()
# Calculate annualized statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
n_assets = len(tickers)
print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")
# Display asset statistics
stats_df = pd.DataFrame({
'Expected Return': annual_returns,
'Volatility': annual_volatility,
'Sharpe (rf=0)': annual_returns / annual_volatility
}).sort_values('Sharpe (rf=0)', ascending=False)
print("Individual Asset Statistics (Annualized)")
print("=" * 55)
print(stats_df.round(4))
Section 5.1: The Optimization Problem
Mean-Variance Optimization (MVO) is the mathematical framework that earned Harry Markowitz the Nobel Prize.
In this section, you will learn: - How to formulate portfolio optimization mathematically - The minimum variance and maximum Sharpe objectives - Core portfolio metric functions
5.1.1 Mathematical Formulation
Minimize Portfolio Variance:
$$\min_{\mathbf{w}} \quad \mathbf{w}^T \Sigma \mathbf{w}$$
Subject to:
$$\mathbf{w}^T \mathbf{\mu} = R_{target} \quad \text{(target return)}$$ $$\mathbf{w}^T \mathbf{1} = 1 \quad \text{(weights sum to 1)}$$
Where: - $\mathbf{w}$ = vector of portfolio weights - $\Sigma$ = covariance matrix - $\mathbf{\mu}$ = vector of expected returns
# Define core portfolio functions
def portfolio_return(weights: np.ndarray, returns: pd.Series) -> float:
"""Calculate portfolio expected return."""
return np.dot(weights, returns)
def portfolio_volatility(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
"""Calculate portfolio volatility."""
return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
def portfolio_sharpe(weights: np.ndarray, returns: pd.Series,
cov_matrix: pd.DataFrame, rf: float = 0) -> float:
"""Calculate portfolio Sharpe ratio."""
ret = portfolio_return(weights, returns)
vol = portfolio_volatility(weights, cov_matrix)
return (ret - rf) / vol
print("Portfolio functions defined")
# Test with equal weights
equal_weights = np.array([1/n_assets] * n_assets)
print("Equal-Weighted Portfolio Test")
print("=" * 40)
print(f"Return: {portfolio_return(equal_weights, annual_returns)*100:.2f}%")
print(f"Volatility: {portfolio_volatility(equal_weights, cov_matrix)*100:.2f}%")
print(f"Sharpe Ratio: {portfolio_sharpe(equal_weights, annual_returns, cov_matrix):.4f}")
5.1.2 The Global Minimum Variance Portfolio
# Objective functions for optimization
def neg_sharpe(weights, returns, cov_matrix):
"""Negative Sharpe ratio (for minimization)."""
return -portfolio_sharpe(weights, returns, cov_matrix)
def port_variance(weights, cov_matrix):
"""Portfolio variance (for minimization)."""
return np.dot(weights.T, np.dot(cov_matrix, weights))
# Constraints: weights sum to 1
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
initial_weights = np.array([1/n_assets] * n_assets)
# Optimize for minimum variance (unconstrained - allows short selling)
result_minvar = minimize(
port_variance,
initial_weights,
args=(cov_matrix,),
method='SLSQP',
constraints=constraints
)
minvar_weights = result_minvar['x']
print("Global Minimum Variance Portfolio (Unconstrained)")
print("=" * 55)
print(f"\nOptimal Weights:")
for ticker, weight in zip(tickers, minvar_weights):
print(f" {ticker}: {weight*100:+.2f}%")
print(f"\nPortfolio Statistics:")
print(f" Return: {portfolio_return(minvar_weights, annual_returns)*100:.2f}%")
print(f" Volatility: {portfolio_volatility(minvar_weights, cov_matrix)*100:.2f}%")
Exercise 5.1: Maximum Sharpe Portfolio (Guided)
Your Task: Find the portfolio that maximizes the Sharpe ratio using scipy.optimize.minimize.
Fill in the blanks:
Click to reveal solution
def find_max_sharpe(returns: pd.Series, cov_matrix: pd.DataFrame) -> np.ndarray:
"""Find the maximum Sharpe ratio portfolio weights."""
n = len(returns)
initial = np.ones(n) / n
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
result = minimize(
neg_sharpe,
initial,
args=(returns, cov_matrix),
method='SLSQP',
constraints=constraints
)
return result['x'] if result['success'] else None
maxsharpe_weights = find_max_sharpe(annual_returns, cov_matrix)
print(f"Max Sharpe: {portfolio_sharpe(maxsharpe_weights, annual_returns, cov_matrix):.4f}")
print(f"Return: {portfolio_return(maxsharpe_weights, annual_returns)*100:.2f}%")
print(f"Volatility: {portfolio_volatility(maxsharpe_weights, cov_matrix)*100:.2f}%")
Section 5.2: Solving with Scipy
The unconstrained optimization may give extreme positions. In practice, most portfolios can't take short positions.
In this section, you will learn: - How to use scipy.optimize.minimize effectively - Adding bounds for long-only constraints - Building a reusable optimizer class
5.2.1 Long-Only Constraint
# Long-only constraint: 0 <= w_i <= 1
bounds = tuple((0, 1) for _ in range(n_assets))
# Minimum variance with long-only constraint
result_minvar_long = minimize(
port_variance,
initial_weights,
args=(cov_matrix,),
method='SLSQP',
bounds=bounds,
constraints=constraints
)
minvar_long_weights = result_minvar_long['x']
print("Minimum Variance Portfolio (Long-Only)")
print("=" * 55)
for ticker, weight in zip(tickers, minvar_long_weights):
if weight > 0.001:
print(f" {ticker}: {weight*100:.2f}%")
print(f"\nPortfolio Statistics:")
print(f" Return: {portfolio_return(minvar_long_weights, annual_returns)*100:.2f}%")
print(f" Volatility: {portfolio_volatility(minvar_long_weights, cov_matrix)*100:.2f}%")
# Compare unconstrained vs long-only
print("Impact of Long-Only Constraint")
print("=" * 50)
print(f"Unconstrained volatility: {portfolio_volatility(minvar_weights, cov_matrix)*100:.2f}%")
print(f"Long-only volatility: {portfolio_volatility(minvar_long_weights, cov_matrix)*100:.2f}%")
cost = portfolio_volatility(minvar_long_weights, cov_matrix) - portfolio_volatility(minvar_weights, cov_matrix)
print(f"\nCost of constraint: +{cost*100:.2f}% volatility")
5.2.2 Reusable Portfolio Optimizer Class
class PortfolioOptimizer:
"""Mean-variance portfolio optimization."""
def __init__(self, returns: pd.Series, cov_matrix: pd.DataFrame, rf: float = 0):
self.returns = returns
self.cov_matrix = cov_matrix
self.rf = rf
self.n_assets = len(returns)
self.tickers = list(returns.index)
def _port_return(self, weights):
return np.dot(weights, self.returns)
def _port_volatility(self, weights):
return np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
def _port_sharpe(self, weights):
return (self._port_return(weights) - self.rf) / self._port_volatility(weights)
def minimize_volatility(self, target_return=None, long_only=True):
"""Find minimum volatility portfolio."""
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
if target_return is not None:
constraints.append({
'type': 'eq',
'fun': lambda w: self._port_return(w) - target_return
})
bounds = tuple((0, 1) for _ in range(self.n_assets)) if long_only else None
result = minimize(
lambda w: self._port_volatility(w),
np.ones(self.n_assets) / self.n_assets,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
def maximize_sharpe(self, long_only=True):
"""Find maximum Sharpe ratio portfolio."""
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = tuple((0, 1) for _ in range(self.n_assets)) if long_only else None
result = minimize(
lambda w: -self._port_sharpe(w),
np.ones(self.n_assets) / self.n_assets,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
def get_stats(self, weights):
"""Get portfolio statistics."""
return {
'return': self._port_return(weights),
'volatility': self._port_volatility(weights),
'sharpe': self._port_sharpe(weights),
'weights': dict(zip(self.tickers, weights))
}
print("PortfolioOptimizer class defined")
# Test the optimizer
optimizer = PortfolioOptimizer(annual_returns, cov_matrix)
min_vol_w = optimizer.minimize_volatility()
max_sharpe_w = optimizer.maximize_sharpe()
print("Portfolio Optimizer Results")
print("=" * 60)
print("\nMinimum Volatility Portfolio:")
stats = optimizer.get_stats(min_vol_w)
print(f" Return: {stats['return']*100:.2f}%")
print(f" Volatility: {stats['volatility']*100:.2f}%")
print(f" Sharpe: {stats['sharpe']:.4f}")
print("\nMaximum Sharpe Portfolio:")
stats = optimizer.get_stats(max_sharpe_w)
print(f" Return: {stats['return']*100:.2f}%")
print(f" Volatility: {stats['volatility']*100:.2f}%")
print(f" Sharpe: {stats['sharpe']:.4f}")
Exercise 5.2: Add Efficient Frontier Method (Guided)
Your Task: Add a method to the optimizer class that generates efficient frontier points.
Fill in the blanks:
Click to reveal solution
def efficient_frontier(optimizer, n_points: int = 50) -> pd.DataFrame:
"""Generate efficient frontier points."""
min_ret = optimizer.returns.min()
max_ret = optimizer.returns.max()
target_returns = np.linspace(min_ret, max_ret, n_points)
frontier_vols = []
frontier_rets = []
for target in target_returns:
weights = optimizer.minimize_volatility(target_return=target, long_only=True)
if weights is not None:
frontier_rets.append(optimizer._port_return(weights))
frontier_vols.append(optimizer._port_volatility(weights))
return pd.DataFrame({'return': frontier_rets, 'volatility': frontier_vols})
frontier = efficient_frontier(optimizer)
print(frontier.head())
# Plot
plt.figure(figsize=(10, 6))
plt.plot(frontier['volatility'], frontier['return'], 'b-', linewidth=2)
plt.xlabel('Volatility')
plt.ylabel('Return')
plt.title('Efficient Frontier')
plt.grid(True, alpha=0.3)
plt.show()
Section 5.3: Portfolio Constraints
Real-world portfolios have many constraints beyond "no short selling".
In this section, you will learn: - Position limits (max/min weights) - Sector constraints - Combining multiple constraint types
5.3.1 Position Limits
# Max 25% in any single asset
max_weight = 0.25
bounds_constrained = tuple((0, max_weight) for _ in range(n_assets))
result_constrained = minimize(
lambda w: -portfolio_sharpe(w, annual_returns, cov_matrix),
initial_weights,
method='SLSQP',
bounds=bounds_constrained,
constraints=constraints
)
constrained_weights = result_constrained['x']
print(f"Maximum Sharpe Portfolio (Max {max_weight*100:.0f}% per asset)")
print("=" * 55)
for ticker, weight in zip(tickers, constrained_weights):
if weight > 0.001:
marker = " <- AT LIMIT" if abs(weight - max_weight) < 0.001 else ""
print(f" {ticker}: {weight*100:.2f}%{marker}")
print(f"\nSharpe: {portfolio_sharpe(constrained_weights, annual_returns, cov_matrix):.4f}")
5.3.2 Sector Constraints
# Define sector mappings
sectors = {
'AAPL': 'Technology', 'MSFT': 'Technology',
'GOOGL': 'Technology', 'AMZN': 'Technology',
'JNJ': 'Healthcare', 'JPM': 'Financial',
'XOM': 'Energy', 'GLD': 'Commodities'
}
# Get tech stock indices
tech_idx = [i for i, t in enumerate(tickers) if sectors[t] == 'Technology']
print(f"Technology stocks: {[tickers[i] for i in tech_idx]}")
# Constraint: Tech sector <= 50%
max_tech = 0.50
sector_constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'ineq', 'fun': lambda w: max_tech - sum(w[i] for i in tech_idx)}
]
result_sector = minimize(
lambda w: -portfolio_sharpe(w, annual_returns, cov_matrix),
initial_weights,
method='SLSQP',
bounds=bounds, # long-only
constraints=sector_constraints
)
sector_weights = result_sector['x']
# Calculate sector exposures
sector_exposure = {}
for ticker, weight in zip(tickers, sector_weights):
sector = sectors[ticker]
sector_exposure[sector] = sector_exposure.get(sector, 0) + weight
print(f"Maximum Sharpe Portfolio (Tech <= {max_tech*100:.0f}%)")
print("=" * 55)
print("\nSector Exposure:")
for sector, exposure in sorted(sector_exposure.items(), key=lambda x: -x[1]):
marker = " <- AT LIMIT" if sector == 'Technology' and abs(exposure - max_tech) < 0.01 else ""
print(f" {sector}: {exposure*100:.2f}%{marker}")
Exercise 5.3: Custom Constraints (Open-ended)
Your Task:
Build a function that: - Takes min_weight and max_weight parameters - Ensures every asset has at least min_weight allocation - Maximizes Sharpe ratio within these constraints - Returns the optimal weights and portfolio statistics
Your implementation:
Click to reveal solution
def optimize_with_bounds(returns: pd.Series, cov_matrix: pd.DataFrame,
min_weight: float = 0.05,
max_weight: float = 0.25) -> dict:
"""
Optimize portfolio with minimum and maximum weight constraints.
Args:
returns: Expected returns
cov_matrix: Covariance matrix
min_weight: Minimum weight per asset
max_weight: Maximum weight per asset
Returns:
Dictionary with weights and statistics
"""
n = len(returns)
tickers = list(returns.index)
# Bounds with min and max
bounds = tuple((min_weight, max_weight) for _ in range(n))
# Constraint: weights sum to 1
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
# Objective: maximize Sharpe (minimize negative Sharpe)
def neg_sharpe(w):
ret = np.dot(w, returns)
vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
return -ret / vol
result = minimize(
neg_sharpe,
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
if result['success']:
weights = result['x']
ret = np.dot(weights, returns)
vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return {
'weights': dict(zip(tickers, weights)),
'return': ret,
'volatility': vol,
'sharpe': ret / vol
}
return None
# Test
result = optimize_with_bounds(annual_returns, cov_matrix, 0.05, 0.20)
print(f"Constrained Portfolio (5% <= w <= 20%)")
print(f"Return: {result['return']*100:.2f}%")
print(f"Volatility: {result['volatility']*100:.2f}%")
print(f"Sharpe: {result['sharpe']:.4f}")
print(f"\nWeights:")
for ticker, w in result['weights'].items():
print(f" {ticker}: {w*100:.2f}%")
Section 5.4: Target Return & Risk
Sometimes we want a portfolio that meets specific objectives like a target return or risk budget.
In this section, you will learn: - Optimizing for a target return - Optimizing for a target volatility - Visualizing optimization results
5.4.1 Optimize for Target Return
def optimize_for_target_return(target_return: float, returns: pd.Series,
cov_matrix: pd.DataFrame, long_only: bool = True):
"""Find minimum volatility portfolio for a target return."""
n = len(returns)
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'eq', 'fun': lambda w: np.dot(w, returns) - target_return}
]
bounds = tuple((0, 1) for _ in range(n)) if long_only else None
result = minimize(
lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
# Test with different targets
target_returns = [0.08, 0.12, 0.16, 0.20]
print("Portfolios for Different Target Returns")
print("=" * 55)
for target in target_returns:
weights = optimize_for_target_return(target, annual_returns, cov_matrix)
if weights is not None:
vol = portfolio_volatility(weights, cov_matrix)
print(f"Target {target*100:.0f}%: Volatility = {vol*100:.2f}%")
else:
print(f"Target {target*100:.0f}%: Not achievable")
5.4.2 Optimize for Target Volatility
def optimize_for_target_volatility(target_vol: float, returns: pd.Series,
cov_matrix: pd.DataFrame, long_only: bool = True):
"""Find maximum return portfolio for a target volatility."""
n = len(returns)
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'eq', 'fun': lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))) - target_vol}
]
bounds = tuple((0, 1) for _ in range(n)) if long_only else None
result = minimize(
lambda w: -np.dot(w, returns),
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
# Test with different targets
target_vols = [0.12, 0.16, 0.20, 0.25]
print("Portfolios for Different Target Volatilities")
print("=" * 55)
for target in target_vols:
weights = optimize_for_target_volatility(target, annual_returns, cov_matrix)
if weights is not None:
ret = portfolio_return(weights, annual_returns)
print(f"Target Vol {target*100:.0f}%: Return = {ret*100:.2f}%")
else:
print(f"Target Vol {target*100:.0f}%: Not achievable")
Exercise 5.4: Efficient Frontier Visualization (Guided)
Your Task: Generate and plot the efficient frontier with random portfolios for context.
Fill in the blanks:
Click to reveal solution
def plot_efficient_frontier(returns: pd.Series, cov_matrix: pd.DataFrame,
n_frontier: int = 30, n_random: int = 3000):
"""Plot efficient frontier with random portfolios."""
n = len(returns)
tickers = list(returns.index)
np.random.seed(42)
random_rets = []
random_vols = []
for _ in range(n_random):
w = np.random.random(n)
w = w / w.sum()
random_rets.append(np.dot(w, returns))
random_vols.append(np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))))
min_ret = returns.min()
max_ret = returns.max()
targets = np.linspace(min_ret, max_ret, n_frontier)
frontier_rets = []
frontier_vols = []
for target in targets:
w = optimize_for_target_return(target, returns, cov_matrix)
if w is not None:
frontier_rets.append(np.dot(w, returns))
frontier_vols.append(np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))))
plt.figure(figsize=(12, 8))
plt.scatter(random_vols, random_rets, alpha=0.3, s=5, c='gray', label='Random Portfolios')
plt.plot(frontier_vols, frontier_rets, 'b-', linewidth=2, label='Efficient Frontier')
for ticker in tickers:
vol = np.sqrt(cov_matrix.loc[ticker, ticker])
ret = returns[ticker]
plt.scatter(vol, ret, s=100, marker='*', zorder=5)
plt.annotate(ticker, (vol, ret), xytext=(5, 5), textcoords='offset points')
plt.xlabel('Volatility')
plt.ylabel('Expected Return')
plt.title('Efficient Frontier')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
plot_efficient_frontier(annual_returns, cov_matrix)
Exercise 5.5: Portfolio Optimizer with Multiple Strategies (Open-ended)
Your Task:
Build a function that: - Compares multiple optimization strategies (min vol, max sharpe, equal weight) - Calculates statistics for each - Returns a comparison DataFrame
Your implementation:
Click to reveal solution
def compare_strategies(returns: pd.Series, cov_matrix: pd.DataFrame) -> pd.DataFrame:
"""
Compare multiple portfolio optimization strategies.
Args:
returns: Expected returns
cov_matrix: Covariance matrix
Returns:
DataFrame comparing strategies
"""
n = len(returns)
strategies = {}
# Equal weight
equal_w = np.ones(n) / n
strategies['Equal Weight'] = equal_w
# Minimum volatility
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = tuple((0, 1) for _ in range(n))
result = minimize(
lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
if result['success']:
strategies['Min Volatility'] = result['x']
# Maximum Sharpe
result = minimize(
lambda w: -np.dot(w, returns) / np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
if result['success']:
strategies['Max Sharpe'] = result['x']
# Calculate stats
results = []
for name, weights in strategies.items():
ret = np.dot(weights, returns)
vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
results.append({
'Strategy': name,
'Return': ret,
'Volatility': vol,
'Sharpe': ret / vol
})
return pd.DataFrame(results).set_index('Strategy')
# Test
comparison = compare_strategies(annual_returns, cov_matrix)
print(comparison.round(4))
Exercise 5.6: Complete Optimization Framework (Open-ended)
Your Task:
Build a PortfolioOptimizerPro class that:
- Supports multiple optimization objectives
- Handles various constraint types (bounds, sectors)
- Includes efficient frontier generation
- Provides visualization methods
Your implementation:
Click to reveal solution
class PortfolioOptimizerPro:
"""Professional portfolio optimization framework."""
def __init__(self, returns: pd.Series, cov_matrix: pd.DataFrame,
tickers: list = None, rf: float = 0):
self.returns = returns
self.cov_matrix = cov_matrix
self.tickers = tickers or list(returns.index)
self.rf = rf
self.n = len(returns)
self.results = {}
def _calc_stats(self, weights):
ret = np.dot(weights, self.returns)
vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
return {'return': ret, 'volatility': vol, 'sharpe': (ret - self.rf) / vol}
def optimize(self, objective: str = 'max_sharpe',
min_weight: float = 0, max_weight: float = 1,
sector_limits: dict = None) -> np.ndarray:
"""Run optimization with specified objective and constraints."""
bounds = tuple((min_weight, max_weight) for _ in range(self.n))
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
if objective == 'max_sharpe':
obj_func = lambda w: -np.dot(w, self.returns) / np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w)))
elif objective == 'min_vol':
obj_func = lambda w: np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w)))
else:
raise ValueError(f"Unknown objective: {objective}")
result = minimize(
obj_func,
np.ones(self.n) / self.n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
if result['success']:
self.results[objective] = {
'weights': result['x'],
'stats': self._calc_stats(result['x'])
}
return result['x']
return None
def efficient_frontier(self, n_points: int = 30) -> pd.DataFrame:
"""Generate efficient frontier."""
targets = np.linspace(self.returns.min(), self.returns.max(), n_points)
frontier = []
for target in targets:
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'eq', 'fun': lambda w, t=target: np.dot(w, self.returns) - t}
]
result = minimize(
lambda w: np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w))),
np.ones(self.n) / self.n,
method='SLSQP',
bounds=tuple((0, 1) for _ in range(self.n)),
constraints=constraints
)
if result['success']:
frontier.append({
'return': np.dot(result['x'], self.returns),
'volatility': np.sqrt(np.dot(result['x'].T, np.dot(self.cov_matrix, result['x'])))
})
return pd.DataFrame(frontier)
def plot(self):
"""Plot results."""
frontier = self.efficient_frontier()
plt.figure(figsize=(12, 8))
plt.plot(frontier['volatility'], frontier['return'], 'b-', linewidth=2, label='Efficient Frontier')
for name, data in self.results.items():
stats = data['stats']
plt.scatter(stats['volatility'], stats['return'], s=150, marker='D',
label=f"{name} (SR={stats['sharpe']:.2f})", zorder=5)
plt.xlabel('Volatility')
plt.ylabel('Return')
plt.title('Portfolio Optimization Results')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Test
opt_pro = PortfolioOptimizerPro(annual_returns, cov_matrix, tickers)
opt_pro.optimize('max_sharpe')
opt_pro.optimize('min_vol')
opt_pro.plot()
Module Project: Complete Portfolio Optimization Workflow
Build a complete portfolio optimization workflow comparing multiple strategies.
Your Challenge:
- Run multiple optimization strategies
- Compare portfolio statistics
- Visualize weight allocations
- Generate the efficient frontier
- Summarize findings
# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Portfolio Optimization Project
# Step 1: Run multiple strategies
optimizer = PortfolioOptimizer(annual_returns, cov_matrix)
strategies = {
'Equal Weight': np.ones(n_assets) / n_assets,
'Min Volatility': optimizer.minimize_volatility(),
'Max Sharpe': optimizer.maximize_sharpe()
}
# Step 2: Compare statistics
print("Strategy Comparison")
print("=" * 70)
comparison_data = []
for name, weights in strategies.items():
stats = optimizer.get_stats(weights)
comparison_data.append({
'Strategy': name,
'Return': stats['return'],
'Volatility': stats['volatility'],
'Sharpe': stats['sharpe']
})
comparison_df = pd.DataFrame(comparison_data).set_index('Strategy')
print(comparison_df.round(4))
# Step 3: Visualize allocations
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, (name, weights) in zip(axes, strategies.items()):
bars = ax.barh(tickers, weights * 100)
ax.set_xlabel('Weight (%)')
ax.set_title(name)
ax.axvline(x=0, color='black', linewidth=0.5)
plt.tight_layout()
plt.show()
# Step 4: Summary
best_sharpe = comparison_df['Sharpe'].idxmax()
lowest_vol = comparison_df['Volatility'].idxmin()
print(f"\nKey Findings:")
print(f" Best Risk-Adjusted: {best_sharpe} (Sharpe = {comparison_df.loc[best_sharpe, 'Sharpe']:.4f})")
print(f" Lowest Risk: {lowest_vol} (Vol = {comparison_df.loc[lowest_vol, 'Volatility']*100:.2f}%)")
Key Takeaways
What You Learned
1. Optimization Fundamentals
- Minimize volatility or maximize Sharpe ratio
- Use scipy.optimize.minimize with constraints
2. Constraints
- Long-only:
bounds = tuple((0, 1) for _ in range(n)) - Sum to 1:
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1} - Position limits: Custom bounds
- Sector limits: Inequality constraints
3. Target Optimization
- Target return: Add return equality constraint
- Target volatility: Add volatility equality constraint
4. Trade-offs
- More constraints = Lower optimal performance
- But more realistic, implementable portfolios
Key Code Patterns
# Basic optimization
result = minimize(
objective_function,
initial_weights,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
Coming Up Next
In Module 6: Advanced Portfolio Techniques, we'll explore Risk Parity, Black-Litterman, and Hierarchical Risk Parity.
Congratulations on completing Module 5!
Module 6: Advanced Portfolio Techniques
Course 3: Quantitative Finance
Part 2: Portfolio Theory
Learning Objectives
By the end of this module, you will be able to:
- Implement Risk Parity portfolios for equal risk contribution
- Apply the Black-Litterman model to incorporate views
- Build robust portfolios that reduce estimation error
- Use Hierarchical Risk Parity (HRP) for ML-based allocation
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 5: Mean-Variance Optimization |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
from scipy.cluster.hierarchy import linkage, dendrogram
from scipy.spatial.distance import squareform
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download multi-asset data
tickers = ['SPY', 'TLT', 'GLD', 'VNQ', 'EFA', 'EEM', 'IEF', 'DBC']
# SPY: US Equities, TLT: Long Bonds, GLD: Gold, VNQ: REITs
# EFA: Developed Intl, EEM: Emerging, IEF: Intermediate Bonds, DBC: Commodities
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
print("Downloading multi-asset data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.xs('Close', axis=1, level=1)
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()
# Calculate statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
corr_matrix = returns.corr()
n_assets = len(tickers)
print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")
Section 6.1: Risk Parity
A 60/40 stock/bond portfolio sounds balanced, but stocks contribute ~90% of the portfolio's risk!
In this section, you will learn: - How to calculate risk contribution by asset - Building portfolios where each asset contributes equal risk - Risk budgeting with custom allocations
6.1.1 Risk Contribution
The risk contribution of asset $i$ is:
$$RC_i = w_i \cdot \frac{\partial \sigma_p}{\partial w_i} = w_i \cdot \frac{(\Sigma w)_i}{\sigma_p}$$
For a risk parity portfolio: $$RC_i = RC_j \quad \forall i,j$$
# Core risk functions
def portfolio_volatility(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
"""Calculate portfolio volatility."""
return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
def risk_contribution(weights: np.ndarray, cov_matrix: pd.DataFrame) -> np.ndarray:
"""Calculate risk contribution of each asset."""
port_vol = portfolio_volatility(weights, cov_matrix)
marginal_contrib = np.dot(cov_matrix, weights) / port_vol
return weights * marginal_contrib
def risk_contribution_pct(weights: np.ndarray, cov_matrix: pd.DataFrame) -> np.ndarray:
"""Risk contribution as percentage of total."""
rc = risk_contribution(weights, cov_matrix)
return rc / rc.sum() * 100
print("Risk functions defined")
# Equal Weight Portfolio - Risk Contribution
equal_weights = np.array([1/n_assets] * n_assets)
print("Equal Weight Portfolio - Risk Contribution")
print("=" * 50)
rc_equal = risk_contribution_pct(equal_weights, cov_matrix)
for ticker, rc in zip(tickers, rc_equal):
print(f" {ticker}: {rc:.2f}%")
print(f"\nNotice: Even with equal weights, risk contribution varies widely!")
6.1.2 Risk Parity Optimization
def risk_parity_objective(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
"""Objective: minimize deviation from equal risk contribution."""
rc = risk_contribution(weights, cov_matrix)
target = np.ones(len(weights)) / len(weights) * rc.sum()
return np.sum((rc - target) ** 2)
def optimize_risk_parity(cov_matrix: pd.DataFrame) -> np.ndarray:
"""Find risk parity weights."""
n = cov_matrix.shape[0]
x0 = np.ones(n) / n
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = tuple((0.01, 1) for _ in range(n))
result = minimize(
risk_parity_objective,
x0,
args=(cov_matrix,),
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
# Find Risk Parity portfolio
rp_weights = optimize_risk_parity(cov_matrix)
print("Risk Parity Portfolio")
print("=" * 55)
print(f"\n{'Asset':<8} {'Weight':>10} {'Risk Contrib':>15}")
print("-" * 40)
rc_rp = risk_contribution_pct(rp_weights, cov_matrix)
for ticker, w, rc in zip(tickers, rp_weights, rc_rp):
print(f"{ticker:<8} {w*100:>9.2f}% {rc:>14.2f}%")
print(f"\nPortfolio Volatility: {portfolio_volatility(rp_weights, cov_matrix)*100:.2f}%")
Exercise 6.1: Risk Budgeting (Guided)
Your Task: Implement risk budgeting where you can specify custom risk allocations per asset.
Fill in the blanks:
Click to reveal solution
def risk_budgeting_objective(weights: np.ndarray, cov_matrix: pd.DataFrame,
risk_budget: np.ndarray) -> float:
"""Objective: match specified risk budget."""
rc = risk_contribution(weights, cov_matrix)
rc_pct = rc / rc.sum()
return np.sum((rc_pct - risk_budget) ** 2)
def optimize_risk_budget(cov_matrix: pd.DataFrame,
risk_budget: np.ndarray) -> np.ndarray:
"""Find weights for a given risk budget."""
n = cov_matrix.shape[0]
x0 = np.ones(n) / n
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = tuple((0.01, 1) for _ in range(n))
result = minimize(
risk_budgeting_objective,
x0,
args=(cov_matrix, risk_budget),
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
# Test: 60% equity risk (SPY, EFA, EEM), 40% defensive
risk_budget = np.array([0.20, 0.10, 0.10, 0.10, 0.20, 0.20, 0.05, 0.05])
rb_weights = optimize_risk_budget(cov_matrix, risk_budget)
print("Risk Budgeting Portfolio")
rc_rb = risk_contribution_pct(rb_weights, cov_matrix)
for ticker, w, rc, target in zip(tickers, rb_weights, rc_rb, risk_budget*100):
print(f"{ticker}: Weight={w*100:.2f}%, RC={rc:.2f}%, Target={target:.2f}%")
Section 6.2: Black-Litterman Model
MVO requires return estimates that are unreliable. Black-Litterman combines market equilibrium with investor views.
In this section, you will learn: - Deriving equilibrium returns from market weights - Expressing and incorporating views - Computing posterior expected returns
6.2.1 Equilibrium Returns
Equilibrium returns (implied by market): $$\Pi = \delta \Sigma w_{mkt}$$
Where: - $\delta$ = risk aversion coefficient (typically 2-4) - $\Sigma$ = covariance matrix - $w_{mkt}$ = market capitalization weights
# Calculate equilibrium returns
market_weights = np.array([1/n_assets] * n_assets) # Simplified
delta = 2.5 # Risk aversion
equilibrium_returns = delta * np.dot(cov_matrix, market_weights)
print("Equilibrium Returns (Implied by Market)")
print("=" * 45)
for ticker, ret in zip(tickers, equilibrium_returns):
print(f" {ticker}: {ret*100:.2f}%")
6.2.2 Black-Litterman Formula
Posterior expected returns: $$E[R] = [(\tau\Sigma)^{-1} + P'\Omega^{-1}P]^{-1} [(\tau\Sigma)^{-1}\Pi + P'\Omega^{-1}Q]$$
def black_litterman(cov_matrix: np.ndarray, market_weights: np.ndarray,
P: np.ndarray, Q: np.ndarray, omega: np.ndarray,
tau: float = 0.05, delta: float = 2.5) -> tuple:
"""
Black-Litterman model.
Args:
cov_matrix: Asset covariance matrix
market_weights: Market cap weights
P: View matrix (k x n)
Q: View vector (k x 1)
omega: View uncertainty matrix (k x k)
tau: Scalar (0.025-0.05 typical)
delta: Risk aversion coefficient
Returns:
Tuple of (posterior_returns, posterior_cov)
"""
# Equilibrium returns
pi = delta * np.dot(cov_matrix, market_weights)
# Scaled covariance
tau_sigma = tau * cov_matrix
tau_sigma_inv = np.linalg.inv(tau_sigma)
omega_inv = np.linalg.inv(omega)
# Posterior precision and covariance
posterior_precision = tau_sigma_inv + np.dot(P.T, np.dot(omega_inv, P))
posterior_cov = np.linalg.inv(posterior_precision)
# Posterior mean
posterior_returns = np.dot(posterior_cov,
np.dot(tau_sigma_inv, pi) + np.dot(P.T, np.dot(omega_inv, Q)))
return posterior_returns, posterior_cov
print("Black-Litterman function defined")
# Example views:
# View 1: SPY will return 8% (absolute)
# View 2: EEM will outperform EFA by 2% (relative)
# P matrix: rows=views, columns=assets
# SPY=0, TLT=1, GLD=2, VNQ=3, EFA=4, EEM=5, IEF=6, DBC=7
P = np.array([
[1, 0, 0, 0, 0, 0, 0, 0], # SPY
[0, 0, 0, 0, -1, 1, 0, 0] # EEM - EFA
])
Q = np.array([0.08, 0.02]) # View returns
# Omega: view uncertainty (diagonal)
omega = np.diag([0.001, 0.002])
print("Investor Views:")
print(" View 1: SPY returns 8% (high confidence)")
print(" View 2: EEM outperforms EFA by 2% (moderate confidence)")
# Apply Black-Litterman
bl_returns, bl_cov = black_litterman(
cov_matrix.values, market_weights, P, Q, omega
)
print("Black-Litterman Expected Returns")
print("=" * 55)
print(f"\n{'Asset':<8} {'Equilibrium':>12} {'BL Return':>12} {'Change':>10}")
print("-" * 50)
for i, ticker in enumerate(tickers):
eq_ret = equilibrium_returns[i]
bl_ret = bl_returns[i]
change = bl_ret - eq_ret
print(f"{ticker:<8} {eq_ret*100:>11.2f}% {bl_ret*100:>11.2f}% {change*100:>+9.2f}%")
Exercise 6.2: Black-Litterman Optimizer (Guided)
Your Task: Build a complete Black-Litterman workflow with optimization.
Fill in the blanks:
Click to reveal solution
def bl_optimize(returns: np.ndarray, cov_matrix: np.ndarray,
long_only: bool = True) -> np.ndarray:
"""Optimize portfolio using Black-Litterman returns."""
n = len(returns)
def neg_sharpe(w):
ret = np.dot(w, returns)
vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
return -ret / vol
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = tuple((0, 1) for _ in range(n)) if long_only else None
result = minimize(
neg_sharpe,
np.ones(n) / n,
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result['x'] if result['success'] else None
bl_weights = bl_optimize(bl_returns, bl_cov)
print("Black-Litterman Optimized Portfolio")
print("=" * 45)
for ticker, w in zip(tickers, bl_weights):
print(f" {ticker}: {w*100:.2f}%")
Exercise 6.3: Custom Views Implementation (Open-ended)
Your Task:
Build a function that: - Takes a list of view specifications (dictionaries) - Constructs the P matrix and Q vector automatically - Allows specifying confidence levels - Returns Black-Litterman expected returns
Your implementation:
Click to reveal solution
def build_views(tickers: list, views: list) -> tuple:
"""
Build P, Q, omega matrices from view specifications.
Args:
tickers: List of ticker symbols
views: List of dicts with keys:
- 'asset': ticker for absolute view, OR
- 'long': ticker to go long, 'short': ticker to short
- 'return': expected return
- 'confidence': 'high', 'medium', 'low'
Returns:
Tuple of (P, Q, omega)
"""
n_assets = len(tickers)
n_views = len(views)
ticker_idx = {t: i for i, t in enumerate(tickers)}
confidence_map = {'high': 0.001, 'medium': 0.002, 'low': 0.005}
P = np.zeros((n_views, n_assets))
Q = np.zeros(n_views)
omega_diag = np.zeros(n_views)
for i, view in enumerate(views):
Q[i] = view['return']
omega_diag[i] = confidence_map.get(view.get('confidence', 'medium'), 0.002)
if 'asset' in view: # Absolute view
P[i, ticker_idx[view['asset']]] = 1
else: # Relative view
P[i, ticker_idx[view['long']]] = 1
P[i, ticker_idx[view['short']]] = -1
return P, Q, np.diag(omega_diag)
# Test
views = [
{'asset': 'SPY', 'return': 0.10, 'confidence': 'high'},
{'long': 'EEM', 'short': 'EFA', 'return': 0.03, 'confidence': 'medium'},
{'asset': 'GLD', 'return': 0.05, 'confidence': 'low'}
]
P, Q, omega = build_views(tickers, views)
bl_ret, bl_cov = black_litterman(cov_matrix.values, market_weights, P, Q, omega)
print("BL Returns with Custom Views:")
for t, r in zip(tickers, bl_ret):
print(f" {t}: {r*100:.2f}%")
Section 6.3: Robust Optimization
MVO is highly sensitive to input estimates. Robust methods reduce this sensitivity.
In this section, you will learn: - Shrinkage estimators for covariance - Resampled efficiency - Evaluating weight stability
6.3.1 Shrinkage Estimators
def ledoit_wolf_shrinkage(returns: pd.DataFrame, shrinkage: float = 0.2) -> np.ndarray:
"""
Simplified Ledoit-Wolf shrinkage.
Shrinks sample covariance toward scaled identity.
"""
sample_cov = returns.cov().values * 252
p = sample_cov.shape[0]
# Target: scaled identity
mu = np.trace(sample_cov) / p
target = mu * np.eye(p)
# Shrunk covariance
shrunk_cov = shrinkage * target + (1 - shrinkage) * sample_cov
return shrunk_cov
shrunk_cov = ledoit_wolf_shrinkage(returns)
print(f"Shrinkage applied: Sample cov pulled 20% toward diagonal")
6.3.2 Resampled Efficiency
def resampled_optimization(returns: np.ndarray, cov_matrix: np.ndarray,
n_samples: int = 100) -> tuple:
"""
Resampled efficient frontier.
Bootstrap samples for more stable weights.
"""
n_assets = len(returns)
all_weights = []
np.random.seed(42)
for _ in range(n_samples):
# Simulate returns from distribution
sim_returns = np.random.multivariate_normal(
returns, cov_matrix / 252, size=252
)
sim_mean = sim_returns.mean(axis=0) * 252
sim_cov = np.cov(sim_returns.T) * 252
# Optimize
def neg_sharpe(w):
return -np.dot(w, sim_mean) / np.sqrt(np.dot(w.T, np.dot(sim_cov, w)))
result = minimize(
neg_sharpe,
np.ones(n_assets) / n_assets,
method='SLSQP',
bounds=tuple((0, 1) for _ in range(n_assets)),
constraints=[{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
)
if result['success']:
all_weights.append(result['x'])
# Average weights
avg_weights = np.mean(all_weights, axis=0)
avg_weights = avg_weights / avg_weights.sum()
return avg_weights, np.array(all_weights)
print("Running resampled optimization (100 samples)...")
resampled_weights, all_resampled = resampled_optimization(
annual_returns.values, cov_matrix.values
)
print("\nResampled Efficient Portfolio")
for ticker, w in zip(tickers, resampled_weights):
print(f" {ticker}: {w*100:.2f}%")
Exercise 6.4: Weight Stability Analysis (Open-ended)
Your Task:
Build a function that: - Takes the resampled weight matrix - Calculates statistics for each asset (mean, std, confidence interval) - Visualizes the weight uncertainty with box plots - Returns a DataFrame with weight statistics
Your implementation:
Click to reveal solution
def analyze_weight_stability(all_weights: np.ndarray,
tickers: list) -> pd.DataFrame:
"""
Analyze stability of resampled weights.
Args:
all_weights: Array of shape (n_samples, n_assets)
tickers: List of ticker symbols
Returns:
DataFrame with weight statistics
"""
stats = []
for i, ticker in enumerate(tickers):
weights_i = all_weights[:, i]
stats.append({
'Ticker': ticker,
'Mean': weights_i.mean(),
'Std': weights_i.std(),
'Min': weights_i.min(),
'Max': weights_i.max(),
'CI_Lower': np.percentile(weights_i, 2.5),
'CI_Upper': np.percentile(weights_i, 97.5)
})
df = pd.DataFrame(stats).set_index('Ticker')
# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
bp = ax.boxplot([all_weights[:, i] * 100 for i in range(len(tickers))],
positions=range(len(tickers)))
ax.set_xticks(range(len(tickers)))
ax.set_xticklabels(tickers)
ax.set_ylabel('Weight (%)')
ax.set_title('Weight Uncertainty from Resampling')
ax.axhline(y=100/len(tickers), color='red', linestyle='--', alpha=0.5, label='Equal Weight')
ax.legend()
plt.tight_layout()
plt.show()
return df
# Test
stability_df = analyze_weight_stability(all_resampled, tickers)
print(stability_df.round(4))
Section 6.4: Hierarchical Risk Parity (HRP)
HRP is a modern ML-based approach that doesn't require return estimates.
In this section, you will learn: - Hierarchical clustering of assets - Quasi-diagonalization - Recursive bisection for allocation
6.4.1 HRP Algorithm
- Tree Clustering: Build hierarchy based on correlation distance
- Quasi-Diagonalization: Reorder assets to cluster similar ones
- Recursive Bisection: Split and allocate inversely to variance
def correlation_distance(corr_matrix: np.ndarray) -> np.ndarray:
"""Convert correlation to distance."""
return np.sqrt(0.5 * (1 - corr_matrix))
def get_quasi_diag(link: np.ndarray) -> list:
"""Get quasi-diagonal order from linkage."""
link = link.astype(int)
sort_ix = pd.Series([link[-1, 0], link[-1, 1]])
num_items = link[-1, 3]
while sort_ix.max() >= num_items:
sort_ix.index = range(0, sort_ix.shape[0] * 2, 2)
df0 = sort_ix[sort_ix >= num_items]
i = df0.index
j = df0.values - num_items
sort_ix[i] = link[j, 0]
df0 = pd.Series(link[j, 1], index=i + 1)
sort_ix = pd.concat([sort_ix, df0]).sort_index()
sort_ix.index = range(sort_ix.shape[0])
return sort_ix.tolist()
def get_cluster_var(cov: pd.DataFrame, cluster_items: list) -> float:
"""Calculate variance of a cluster."""
cov_slice = cov.iloc[cluster_items, cluster_items]
w = 1 / np.diag(cov_slice)
w = w / w.sum()
return np.dot(w, np.dot(cov_slice, w))
def hrp_allocation(cov: pd.DataFrame, sort_ix: list) -> pd.Series:
"""Recursive bisection allocation."""
w = pd.Series(1.0, index=sort_ix)
cluster_items = [sort_ix]
while len(cluster_items) > 0:
cluster_items = [i[j:k] for i in cluster_items
for j, k in ((0, len(i) // 2), (len(i) // 2, len(i)))
if len(i) > 1]
for i in range(0, len(cluster_items), 2):
c0 = cluster_items[i]
c1 = cluster_items[i + 1]
var0 = get_cluster_var(cov, c0)
var1 = get_cluster_var(cov, c1)
alpha = 1 - var0 / (var0 + var1)
w[c0] *= alpha
w[c1] *= 1 - alpha
return w
print("HRP functions defined")
# Step 1: Calculate distance and cluster
dist_matrix = correlation_distance(corr_matrix.values)
dist_array = squareform(dist_matrix, checks=False)
link = linkage(dist_array, method='ward')
# Plot dendrogram
plt.figure(figsize=(12, 6))
dendrogram(link, labels=tickers, leaf_rotation=0)
plt.title('Hierarchical Clustering of Assets', fontsize=14, fontweight='bold')
plt.xlabel('Asset')
plt.ylabel('Distance')
plt.tight_layout()
plt.show()
# Step 2 & 3: Quasi-diagonal order and allocation
sort_ix = get_quasi_diag(link)
print(f"Quasi-diagonal ordering: {' -> '.join([tickers[i] for i in sort_ix])}")
cov_df = pd.DataFrame(cov_matrix.values, index=range(n_assets), columns=range(n_assets))
hrp_w = hrp_allocation(cov_df, sort_ix)
# Map to original order
hrp_weights = np.zeros(n_assets)
for i, idx in enumerate(sort_ix):
hrp_weights[idx] = hrp_w.iloc[i]
print("\nHierarchical Risk Parity Weights:")
for ticker, w in zip(tickers, hrp_weights):
print(f" {ticker}: {w*100:.2f}%")
Exercise 6.5: Complete HRP Class (Guided)
Your Task: Build a complete HRP class encapsulating all steps.
Fill in the blanks:
Click to reveal solution
class HierarchicalRiskParity:
"""Hierarchical Risk Parity portfolio allocation."""
def __init__(self, returns: pd.DataFrame):
self.returns = returns
self.tickers = list(returns.columns)
self.n_assets = len(self.tickers)
self.cov_matrix = returns.cov() * 252
self.corr_matrix = returns.corr()
self.weights = None
def fit(self):
"""Run HRP algorithm."""
dist = correlation_distance(self.corr_matrix.values)
dist_array = squareform(dist, checks=False)
self.link = linkage(dist_array, method='ward')
self.sort_ix = get_quasi_diag(self.link)
cov_df = pd.DataFrame(self.cov_matrix.values,
index=range(self.n_assets),
columns=range(self.n_assets))
hrp_w = hrp_allocation(cov_df, self.sort_ix)
self.weights = np.zeros(self.n_assets)
for i, idx in enumerate(self.sort_ix):
self.weights[idx] = hrp_w.iloc[i]
return self
def get_weights(self) -> dict:
return dict(zip(self.tickers, self.weights))
hrp = HierarchicalRiskParity(returns).fit()
print("HRP Weights:")
for ticker, w in hrp.get_weights().items():
print(f" {ticker}: {w*100:.2f}%")
Exercise 6.6: Complete Portfolio Allocator (Open-ended)
Your Task:
Build an AdvancedPortfolioAllocator class that:
- Implements all techniques: Equal Weight, Risk Parity, HRP
- Calculates performance statistics for each
- Provides comparison methods
- Includes visualization
Your implementation:
Click to reveal solution
class AdvancedPortfolioAllocator:
"""Advanced portfolio allocation with multiple techniques."""
def __init__(self, returns: pd.DataFrame):
self.returns = returns
self.tickers = list(returns.columns)
self.n_assets = len(self.tickers)
self.cov_matrix = returns.cov() * 252
self.annual_returns = returns.mean() * 252
self.results = {}
def equal_weight(self):
"""Equal weight allocation."""
self.results['Equal Weight'] = np.ones(self.n_assets) / self.n_assets
return self
def risk_parity(self):
"""Risk parity allocation."""
self.results['Risk Parity'] = optimize_risk_parity(self.cov_matrix)
return self
def hrp(self):
"""HRP allocation."""
hrp_model = HierarchicalRiskParity(self.returns).fit()
self.results['HRP'] = hrp_model.weights
return self
def run_all(self):
"""Run all allocation methods."""
return self.equal_weight().risk_parity().hrp()
def get_stats(self, weights: np.ndarray) -> dict:
"""Get portfolio statistics."""
ret = np.dot(weights, self.annual_returns)
vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
return {'return': ret, 'volatility': vol, 'sharpe': ret / vol}
def compare(self) -> pd.DataFrame:
"""Compare all methods."""
if not self.results:
self.run_all()
data = []
for name, weights in self.results.items():
stats = self.get_stats(weights)
data.append({
'Method': name,
'Return': stats['return'],
'Volatility': stats['volatility'],
'Sharpe': stats['sharpe']
})
return pd.DataFrame(data).set_index('Method')
def plot(self):
"""Plot weight comparison."""
if not self.results:
self.run_all()
df = pd.DataFrame(self.results, index=self.tickers) * 100
ax = df.plot(kind='bar', figsize=(12, 6), width=0.8)
ax.set_ylabel('Weight (%)')
ax.set_title('Portfolio Allocation Comparison')
plt.xticks(rotation=0)
plt.legend(loc='upper right')
plt.tight_layout()
plt.show()
# Test
allocator = AdvancedPortfolioAllocator(returns)
allocator.run_all()
print(allocator.compare().round(4))
allocator.plot()
Module Project: Advanced Portfolio Allocation System
Build a complete system implementing all advanced techniques.
Your Challenge:
- Implement Equal Weight, Risk Parity, and HRP
- Compare all methods on return, risk, and Sharpe
- Visualize weight allocations
- Analyze risk contribution for each method
- Summarize findings and recommendations
# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Advanced Allocation Project
# Step 1: Run all methods
methods = {
'Equal Weight': equal_weights,
'Risk Parity': rp_weights,
'HRP': hrp_weights
}
# Step 2: Compare statistics
print("Method Comparison")
print("=" * 65)
print(f"{'Method':<15} {'Return':>10} {'Volatility':>12} {'Sharpe':>10}")
print("-" * 55)
for name, weights in methods.items():
ret = np.dot(weights, annual_returns) * 100
vol = portfolio_volatility(weights, cov_matrix) * 100
sharpe = (ret / 100) / (vol / 100)
print(f"{name:<15} {ret:>9.2f}% {vol:>11.2f}% {sharpe:>10.3f}")
# Step 3: Visualize allocations
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, (name, weights) in zip(axes, methods.items()):
ax.barh(tickers, weights * 100)
ax.set_xlabel('Weight (%)')
ax.set_title(name)
plt.tight_layout()
plt.show()
# Step 4: Risk contribution analysis
print("\nRisk Contribution Analysis")
print("=" * 60)
for name, weights in methods.items():
rc = risk_contribution_pct(weights, cov_matrix)
rc_std = np.std(rc)
print(f"{name}: RC Std Dev = {rc_std:.2f}% (lower is more balanced)")
# Step 5: Summary
print("\nRecommendations:")
print(" - Use Risk Parity for balanced risk contribution")
print(" - Use HRP when you distrust return forecasts")
print(" - Equal Weight provides a simple baseline")
Key Takeaways
What You Learned
1. Risk Parity
- Allocate so each asset contributes equal risk
- Extends to risk budgeting for custom allocations
- More intuitive than MVO
2. Black-Litterman
- Combines market equilibrium with investor views
- More stable than pure MVO
- Widely used by institutions
3. Robust Optimization
- Shrinkage reduces estimation error
- Resampling provides weight uncertainty estimates
4. Hierarchical Risk Parity
- No return estimates needed
- Uses clustering for asset grouping
- Often outperforms MVO out-of-sample
When to Use Each Method
| Method | Best For |
|---|---|
| Risk Parity | Diversified multi-asset portfolios |
| Black-Litterman | When you have specific market views |
| Resampling | When uncertain about estimates |
| HRP | When you distrust return forecasts |
Coming Up Next
In Part 3: Risk Modeling, we'll dive into VaR, CVaR, stress testing, and factor models.
Congratulations on completing Module 6!
Module 7: Value at Risk (VaR)
Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling
Learning Objectives
By the end of this module, you will be able to:
- Calculate VaR using parametric, historical, and Monte Carlo methods
- Understand distributional assumptions behind each VaR approach
- Scale VaR across different time horizons
- Implement VaR backtesting and violation analysis
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 6: Advanced Portfolio Techniques |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download portfolio data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
end_date = datetime.now()
start_date = end_date - timedelta(days=10*365)
print("Downloading historical data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()
# Create a portfolio (60% equities, 40% defensive)
portfolio_weights = np.array([0.40, 0.20, 0.25, 0.15])
portfolio_returns = returns.dot(portfolio_weights)
print(f"\nData loaded: {len(prices)} trading days")
print(f"Portfolio: {dict(zip(tickers, portfolio_weights))}")
Section 7.1: VaR Fundamentals
Value at Risk (VaR) answers a simple but powerful question: "What is the maximum loss we might experience over a given period at a given confidence level?"
In this section, you will learn: - The definition and interpretation of VaR - Common confidence levels and time horizons - How to express VaR in percentage and dollar terms
7.1.1 VaR Definition
VaR at confidence level $\alpha$ is the $\alpha$-th percentile of the loss distribution:
$$P(Loss > VaR_{\alpha}) = 1 - \alpha$$
Common parameters: - Confidence level: 95% or 99% - Time horizon: 1 day, 10 days, 1 month - Position size: Dollar amount at risk
# Visualize the concept of VaR
fig, ax = plt.subplots(figsize=(12, 6))
# Histogram of returns
n, bins, patches = ax.hist(portfolio_returns * 100, bins=100, density=True,
alpha=0.7, color='steelblue', edgecolor='black')
# Calculate 95% VaR
var_95 = np.percentile(portfolio_returns, 5) * 100
# Color the tail
for i in range(len(patches)):
if bins[i] < var_95:
patches[i].set_facecolor('red')
patches[i].set_alpha(0.7)
# Add VaR line
ax.axvline(x=var_95, color='red', linestyle='--', linewidth=2,
label=f'95% VaR = {var_95:.2f}%')
ax.set_xlabel('Daily Return (%)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Value at Risk: The 5% Worst-Case Threshold', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.annotate('5% of days\nworse than VaR', xy=(var_95 - 1, 0.05), fontsize=10,
ha='center', color='red')
plt.tight_layout()
plt.show()
print(f"\n95% Daily VaR: {var_95:.2f}%")
print(f"Interpretation: On 95% of days, losses will not exceed {abs(var_95):.2f}%")
# VaR in dollar terms
portfolio_value = 1_000_000 # $1 million portfolio
var_95_dollars = portfolio_value * abs(np.percentile(portfolio_returns, 5))
var_99_dollars = portfolio_value * abs(np.percentile(portfolio_returns, 1))
print(f"Portfolio Value: ${portfolio_value:,.0f}")
print(f"\n95% Daily VaR: ${var_95_dollars:,.0f}")
print(f"99% Daily VaR: ${var_99_dollars:,.0f}")
7.1.2 Time Scaling VaR
To convert daily VaR to different horizons (assuming i.i.d. returns):
$$VaR_T = VaR_1 \times \sqrt{T}$$
This is the square root of time rule.
# Scale VaR to different horizons
var_1d = abs(np.percentile(portfolio_returns, 5))
horizons = [1, 5, 10, 21, 63, 252]
horizon_names = ['1 Day', '1 Week', '2 Weeks', '1 Month', '1 Quarter', '1 Year']
print("95% VaR at Different Time Horizons")
print("=" * 50)
print(f"Portfolio Value: ${portfolio_value:,.0f}")
print()
for h, name in zip(horizons, horizon_names):
var_h = var_1d * np.sqrt(h)
var_h_dollars = portfolio_value * var_h
print(f"{name:<12}: {var_h*100:>6.2f}% (${var_h_dollars:>12,.0f})")
Section 7.2: Parametric VaR
Parametric VaR assumes returns follow a specific distribution (usually normal) and calculates VaR analytically.
In this section, you will learn: - Normal distribution VaR formula - Student-t distribution for fat tails - Cornish-Fisher adjustment for skewness and kurtosis
7.2.1 Normal Distribution VaR
For normally distributed returns:
$$VaR_{\alpha} = \mu - z_{\alpha} \cdot \sigma$$
Where: - $\mu$ = Mean return - $\sigma$ = Standard deviation - $z_{\alpha}$ = Standard normal quantile (1.645 for 95%, 2.326 for 99%)
def parametric_var_normal(returns: pd.Series, confidence: float = 0.95) -> tuple:
"""
Calculate VaR assuming normal distribution.
Args:
returns: Daily return series
confidence: Confidence level
Returns:
Tuple of (VaR, mean, std)
"""
mu = returns.mean()
sigma = returns.std()
z = stats.norm.ppf(1 - confidence)
var = -(mu + z * sigma)
return var, mu, sigma
# Calculate parametric VaR
var_95_normal, mu, sigma = parametric_var_normal(portfolio_returns, 0.95)
var_99_normal, _, _ = parametric_var_normal(portfolio_returns, 0.99)
print("Parametric VaR (Normal Distribution)")
print("=" * 45)
print(f"Mean daily return: {mu*100:.4f}%")
print(f"Daily volatility: {sigma*100:.4f}%")
print(f"\n95% VaR: {var_95_normal*100:.2f}%")
print(f"99% VaR: {var_99_normal*100:.2f}%")
7.2.2 Student-t Distribution VaR
Financial returns often have fatter tails than normal. The Student-t distribution can capture this.
def parametric_var_t(returns: pd.Series, confidence: float = 0.95) -> tuple:
"""
Calculate VaR using Student-t distribution.
Args:
returns: Daily return series
confidence: Confidence level
Returns:
Tuple of (VaR, degrees_of_freedom, loc, scale)
"""
params = stats.t.fit(returns)
df, loc, scale = params
var = -stats.t.ppf(1 - confidence, df, loc, scale)
return var, df, loc, scale
var_95_t, df, loc, scale = parametric_var_t(portfolio_returns, 0.95)
var_99_t, _, _, _ = parametric_var_t(portfolio_returns, 0.99)
print("Parametric VaR (Student-t Distribution)")
print("=" * 45)
print(f"Degrees of freedom: {df:.2f}")
print(f"Location: {loc*100:.4f}%")
print(f"Scale: {scale*100:.4f}%")
print(f"\n95% VaR: {var_95_t*100:.2f}%")
print(f"99% VaR: {var_99_t*100:.2f}%")
print(f"\nNote: Lower degrees of freedom = fatter tails")
7.2.3 Cornish-Fisher Adjustment
The Cornish-Fisher expansion adjusts normal VaR for skewness and kurtosis:
$$z_{CF} = z + \frac{1}{6}(z^2 - 1)S + \frac{1}{24}(z^3 - 3z)(K-3) - \frac{1}{36}(2z^3 - 5z)S^2$$
def cornish_fisher_var(returns: pd.Series, confidence: float = 0.95) -> tuple:
"""
VaR with Cornish-Fisher adjustment for skewness and kurtosis.
Args:
returns: Daily return series
confidence: Confidence level
Returns:
Tuple of (VaR, skewness, excess_kurtosis)
"""
mu = returns.mean()
sigma = returns.std()
skew = stats.skew(returns)
kurt = stats.kurtosis(returns)
z = stats.norm.ppf(1 - confidence)
z_cf = (z + (z**2 - 1) * skew / 6 +
(z**3 - 3*z) * kurt / 24 -
(2*z**3 - 5*z) * skew**2 / 36)
var = -(mu + z_cf * sigma)
return var, skew, kurt
var_95_cf, skew, kurt = cornish_fisher_var(portfolio_returns, 0.95)
var_99_cf, _, _ = cornish_fisher_var(portfolio_returns, 0.99)
print("Cornish-Fisher VaR (Adjusted for Higher Moments)")
print("=" * 50)
print(f"Skewness: {skew:.4f}")
print(f"Excess Kurtosis: {kurt:.4f}")
print(f"\n95% VaR: {var_95_cf*100:.2f}%")
print(f"99% VaR: {var_99_cf*100:.2f}%")
# Compare all parametric methods
print("Parametric VaR Comparison")
print("=" * 55)
print(f"{'Method':<20} {'95% VaR':>12} {'99% VaR':>12}")
print("-" * 50)
print(f"{'Normal':<20} {var_95_normal*100:>11.2f}% {var_99_normal*100:>11.2f}%")
print(f"{'Student-t':<20} {var_95_t*100:>11.2f}% {var_99_t*100:>11.2f}%")
print(f"{'Cornish-Fisher':<20} {var_95_cf*100:>11.2f}% {var_99_cf*100:>11.2f}%")
Exercise 7.1: Portfolio Parametric VaR (Guided)
Your Task: Calculate the parametric VaR for a multi-asset portfolio using the covariance matrix.
Fill in the blanks to complete the function:
Click to reveal solution
def portfolio_parametric_var(returns: pd.DataFrame,
weights: np.ndarray,
confidence: float = 0.95) -> dict:
cov_matrix = returns.cov()
mean_returns = returns.mean()
port_mean = np.dot(weights, mean_returns)
port_variance = np.dot(weights, np.dot(cov_matrix, weights))
port_std = np.sqrt(port_variance)
z = stats.norm.ppf(1 - confidence)
var = -(port_mean + z * port_std)
return {
'var': var,
'port_mean': port_mean,
'port_std': port_std
}
result = portfolio_parametric_var(returns, portfolio_weights, 0.95)
print(f"Portfolio VaR (95%): {result['var']*100:.2f}%")
print(f"Portfolio Mean: {result['port_mean']*100:.4f}%")
print(f"Portfolio Std: {result['port_std']*100:.4f}%")
Section 7.3: Historical VaR
Historical simulation uses actual past returns—no distributional assumptions needed. The VaR is simply a percentile of historical returns.
In this section, you will learn: - Basic historical VaR calculation - Age-weighted historical VaR - Rolling VaR for time-varying risk
7.3.1 Basic Historical VaR
def historical_var(returns: pd.Series, confidence: float = 0.95) -> float:
"""Calculate VaR using historical simulation."""
var = -np.percentile(returns, (1 - confidence) * 100)
return var
# Calculate historical VaR
var_95_hist = historical_var(portfolio_returns, 0.95)
var_99_hist = historical_var(portfolio_returns, 0.99)
print("Historical Simulation VaR")
print("=" * 40)
print(f"95% VaR: {var_95_hist*100:.2f}%")
print(f"99% VaR: {var_99_hist*100:.2f}%")
# Show worst days
print(f"\nWorst 5 days in history:")
worst_days = portfolio_returns.nsmallest(5)
for date, ret in worst_days.items():
print(f" {date.strftime('%Y-%m-%d')}: {ret*100:.2f}%")
7.3.2 Age-Weighted Historical VaR
Recent data is often more relevant than older data. We can weight observations by recency using exponential decay.
def age_weighted_var(returns: pd.Series,
confidence: float = 0.95,
decay: float = 0.97) -> tuple:
"""
Historical VaR with exponential age weighting.
Args:
returns: Return series
confidence: Confidence level
decay: Decay factor (lambda)
Returns:
Tuple of (VaR, weights)
"""
n = len(returns)
weights = np.array([decay ** i for i in range(n-1, -1, -1)])
weights = weights / weights.sum()
sorted_idx = np.argsort(returns)
sorted_returns = returns.values[sorted_idx]
sorted_weights = weights[sorted_idx]
cumulative_weights = np.cumsum(sorted_weights)
var_idx = np.searchsorted(cumulative_weights, 1 - confidence)
var = -sorted_returns[var_idx]
return var, weights
var_95_aw, weights = age_weighted_var(portfolio_returns, 0.95)
var_99_aw, _ = age_weighted_var(portfolio_returns, 0.99)
print("Age-Weighted Historical VaR (λ = 0.97)")
print("=" * 45)
print(f"95% VaR: {var_95_aw*100:.2f}%")
print(f"99% VaR: {var_99_aw*100:.2f}%")
print(f"\nComparison with equal-weighted:")
print(f" Equal-weighted 95% VaR: {var_95_hist*100:.2f}%")
print(f" Age-weighted 95% VaR: {var_95_aw*100:.2f}%")
7.3.3 Rolling Historical VaR
VaR should be monitored over time, not just calculated once.
# Calculate rolling VaR
window = 252 # 1 year
rolling_var_95 = portfolio_returns.rolling(window).apply(
lambda x: -np.percentile(x, 5)
)
rolling_var_99 = portfolio_returns.rolling(window).apply(
lambda x: -np.percentile(x, 1)
)
# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)
# Top: Returns vs VaR
ax1 = axes[0]
ax1.plot(portfolio_returns.index, portfolio_returns * 100, 'gray', alpha=0.5, label='Daily Returns')
ax1.plot(rolling_var_95.index, -rolling_var_95 * 100, 'r-', linewidth=2, label='95% VaR Threshold')
ax1.fill_between(rolling_var_95.index, -rolling_var_95 * 100, -10, alpha=0.2, color='red')
ax1.set_ylabel('Return (%)')
ax1.set_title('Daily Returns vs Rolling 95% VaR', fontsize=12, fontweight='bold')
ax1.legend(loc='upper right')
ax1.set_ylim(-10, 10)
# Bottom: Rolling VaR levels
ax2 = axes[1]
ax2.plot(rolling_var_95.index, rolling_var_95 * 100, 'b-', linewidth=2, label='95% VaR')
ax2.plot(rolling_var_99.index, rolling_var_99 * 100, 'r-', linewidth=2, label='99% VaR')
ax2.set_xlabel('Date')
ax2.set_ylabel('VaR (%)')
ax2.set_title('Rolling VaR Over Time (252-day window)', fontsize=12, fontweight='bold')
ax2.legend()
plt.tight_layout()
plt.show()
Exercise 7.2: Weighted Historical VaR (Guided)
Your Task: Implement a function that calculates historical VaR with custom weighting schemes.
Fill in the blanks to complete the function:
Click to reveal solution
def weighted_historical_var(returns: pd.Series,
weights: np.ndarray,
confidence: float = 0.95) -> float:
weights = weights / weights.sum()
sorted_idx = np.argsort(returns)
sorted_returns = returns.values[sorted_idx]
sorted_weights = weights[sorted_idx]
cumulative = np.cumsum(sorted_weights)
var_idx = np.searchsorted(cumulative, 1 - confidence)
return -sorted_returns[var_idx]
# Test with equal weights
equal_weights = np.ones(len(portfolio_returns))
var_equal = weighted_historical_var(portfolio_returns, equal_weights, 0.95)
print(f"Equal-weighted VaR: {var_equal*100:.2f}%")
# Test with recency weights
recency_weights = np.arange(1, len(portfolio_returns) + 1)
var_recency = weighted_historical_var(portfolio_returns, recency_weights, 0.95)
print(f"Recency-weighted VaR: {var_recency*100:.2f}%")
Exercise 7.3: VaR Method Comparison (Open-ended)
Your Task:
Build a function that compares VaR estimates across multiple methods and confidence levels: - Calculate VaR using Normal, Student-t, Historical, and Cornish-Fisher methods - Compare results at both 95% and 99% confidence levels - Return results as a formatted DataFrame
Your implementation:
Click to reveal solution
def compare_var_methods(returns: pd.Series,
confidence_levels: list = [0.95, 0.99]) -> pd.DataFrame:
"""
Compare VaR estimates across multiple methods.
Args:
returns: Return series
confidence_levels: List of confidence levels
Returns:
DataFrame with VaR comparisons
"""
results = []
for conf in confidence_levels:
# Normal VaR
mu, sigma = returns.mean(), returns.std()
z = stats.norm.ppf(1 - conf)
var_normal = -(mu + z * sigma)
# Student-t VaR
df, loc, scale = stats.t.fit(returns)
var_t = -stats.t.ppf(1 - conf, df, loc, scale)
# Historical VaR
var_hist = -np.percentile(returns, (1 - conf) * 100)
# Cornish-Fisher VaR
skew = stats.skew(returns)
kurt = stats.kurtosis(returns)
z_cf = (z + (z**2 - 1) * skew / 6 +
(z**3 - 3*z) * kurt / 24 -
(2*z**3 - 5*z) * skew**2 / 36)
var_cf = -(mu + z_cf * sigma)
results.append({
'Confidence': f"{int(conf*100)}%",
'Normal': f"{var_normal*100:.2f}%",
'Student-t': f"{var_t*100:.2f}%",
'Historical': f"{var_hist*100:.2f}%",
'Cornish-Fisher': f"{var_cf*100:.2f}%"
})
return pd.DataFrame(results)
# Test
comparison = compare_var_methods(portfolio_returns)
print("VaR Method Comparison")
print("=" * 60)
print(comparison.to_string(index=False))
Section 7.4: Monte Carlo VaR
Monte Carlo simulation generates thousands of possible scenarios based on assumed distributions. This is flexible—we can model fat tails, correlations, and complex portfolios.
In this section, you will learn: - Basic Monte Carlo VaR with normal distribution - Monte Carlo with fat tails (Student-t) - Correlated multi-asset Monte Carlo
7.4.1 Basic Monte Carlo VaR
def monte_carlo_var(mu: float,
sigma: float,
confidence: float = 0.95,
n_simulations: int = 10000,
seed: int = 42) -> tuple:
"""
Monte Carlo VaR assuming normal distribution.
Args:
mu: Mean return
sigma: Standard deviation
confidence: Confidence level
n_simulations: Number of simulations
seed: Random seed
Returns:
Tuple of (VaR, simulated_returns)
"""
np.random.seed(seed)
simulated_returns = np.random.normal(mu, sigma, n_simulations)
var = -np.percentile(simulated_returns, (1 - confidence) * 100)
return var, simulated_returns
# Calculate MC VaR
mu = portfolio_returns.mean()
sigma = portfolio_returns.std()
var_95_mc, sim_returns = monte_carlo_var(mu, sigma, 0.95)
var_99_mc, _ = monte_carlo_var(mu, sigma, 0.99)
print("Monte Carlo VaR (10,000 simulations)")
print("=" * 45)
print(f"95% VaR: {var_95_mc*100:.2f}%")
print(f"99% VaR: {var_99_mc*100:.2f}%")
# Visualize simulated distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Historical vs Simulated
ax1 = axes[0]
ax1.hist(portfolio_returns * 100, bins=50, density=True, alpha=0.7, label='Historical')
ax1.hist(sim_returns * 100, bins=50, density=True, alpha=0.5, label='MC Simulated')
ax1.axvline(x=-var_95_hist*100, color='blue', linestyle='--', label=f'Hist VaR: {var_95_hist*100:.2f}%')
ax1.axvline(x=-var_95_mc*100, color='orange', linestyle='--', label=f'MC VaR: {var_95_mc*100:.2f}%')
ax1.set_xlabel('Daily Return (%)')
ax1.set_ylabel('Density')
ax1.set_title('Historical vs Monte Carlo Distribution', fontweight='bold')
ax1.legend()
# Q-Q plot
ax2 = axes[1]
stats.probplot(portfolio_returns, dist="norm", plot=ax2)
ax2.set_title('Q-Q Plot: Returns vs Normal', fontweight='bold')
plt.tight_layout()
plt.show()
print("\nQ-Q plot shows fat tails: extreme returns exceed normal expectations.")
7.4.2 Monte Carlo with Fat Tails
def monte_carlo_var_t(returns: pd.Series,
confidence: float = 0.95,
n_simulations: int = 10000,
seed: int = 42) -> tuple:
"""
Monte Carlo VaR using fitted Student-t distribution.
Args:
returns: Historical returns for fitting
confidence: Confidence level
n_simulations: Number of simulations
seed: Random seed
Returns:
Tuple of (VaR, simulated_returns, degrees_of_freedom)
"""
np.random.seed(seed)
df, loc, scale = stats.t.fit(returns)
simulated = stats.t.rvs(df, loc=loc, scale=scale, size=n_simulations)
var = -np.percentile(simulated, (1 - confidence) * 100)
return var, simulated, df
var_95_mc_t, sim_t, df = monte_carlo_var_t(portfolio_returns, 0.95)
var_99_mc_t, _, _ = monte_carlo_var_t(portfolio_returns, 0.99)
print("Monte Carlo VaR with Student-t Distribution")
print("=" * 50)
print(f"Fitted degrees of freedom: {df:.2f}")
print(f"\n95% VaR: {var_95_mc_t*100:.2f}%")
print(f"99% VaR: {var_99_mc_t*100:.2f}%")
print(f"\nFat tails increase extreme risk estimates by {((var_99_mc_t - var_99_mc)/var_99_mc)*100:.1f}%")
7.4.3 Correlated Multi-Asset Monte Carlo
def multivariate_mc_var(returns: pd.DataFrame,
weights: np.ndarray,
confidence: float = 0.95,
n_simulations: int = 10000,
seed: int = 42) -> tuple:
"""
Monte Carlo VaR for multi-asset portfolio with correlations.
Args:
returns: DataFrame of asset returns
weights: Portfolio weights
confidence: Confidence level
n_simulations: Number of simulations
seed: Random seed
Returns:
Tuple of (VaR, portfolio_returns, asset_returns)
"""
np.random.seed(seed)
mean_returns = returns.mean().values
cov_matrix = returns.cov().values
simulated_asset_returns = np.random.multivariate_normal(
mean_returns, cov_matrix, n_simulations
)
simulated_portfolio_returns = simulated_asset_returns @ weights
var = -np.percentile(simulated_portfolio_returns, (1 - confidence) * 100)
return var, simulated_portfolio_returns, simulated_asset_returns
var_95_multi, sim_port, sim_assets = multivariate_mc_var(returns, portfolio_weights, 0.95)
var_99_multi, _, _ = multivariate_mc_var(returns, portfolio_weights, 0.99)
print("Multi-Asset Monte Carlo VaR (with Correlations)")
print("=" * 50)
print(f"Portfolio: {dict(zip(tickers, portfolio_weights))}")
print(f"\n95% VaR: {var_95_multi*100:.2f}%")
print(f"99% VaR: {var_99_multi*100:.2f}%")
# Compare all VaR methods
print("\nVaR Method Comparison")
print("=" * 60)
print(f"{'Method':<25} {'95% VaR':>12} {'99% VaR':>12}")
print("-" * 55)
print(f"{'Parametric (Normal)':<25} {var_95_normal*100:>11.2f}% {var_99_normal*100:>11.2f}%")
print(f"{'Parametric (Student-t)':<25} {var_95_t*100:>11.2f}% {var_99_t*100:>11.2f}%")
print(f"{'Parametric (CF)':<25} {var_95_cf*100:>11.2f}% {var_99_cf*100:>11.2f}%")
print(f"{'Historical':<25} {var_95_hist*100:>11.2f}% {var_99_hist*100:>11.2f}%")
print(f"{'Age-Weighted Historical':<25} {var_95_aw*100:>11.2f}% {var_99_aw*100:>11.2f}%")
print(f"{'Monte Carlo (Normal)':<25} {var_95_mc*100:>11.2f}% {var_99_mc*100:>11.2f}%")
print(f"{'Monte Carlo (t-dist)':<25} {var_95_mc_t*100:>11.2f}% {var_99_mc_t*100:>11.2f}%")
print(f"{'MC Multi-Asset':<25} {var_95_multi*100:>11.2f}% {var_99_multi*100:>11.2f}%")
Exercise 7.4: Monte Carlo Simulation Engine (Guided)
Your Task: Build a Monte Carlo simulator that can use different distribution assumptions.
Fill in the blanks to complete the function:
Click to reveal solution
def monte_carlo_engine(returns: pd.Series,
n_simulations: int = 10000,
distribution: str = 'normal',
seed: int = 42) -> np.ndarray:
np.random.seed(seed)
if distribution == 'normal':
mu = returns.mean()
sigma = returns.std()
simulated = np.random.normal(mu, sigma, n_simulations)
elif distribution == 't':
df, loc, scale = stats.t.fit(returns)
simulated = stats.t.rvs(df, loc=loc, scale=scale, size=n_simulations)
return simulated
# Test
sim_normal = monte_carlo_engine(portfolio_returns, distribution='normal')
sim_t = monte_carlo_engine(portfolio_returns, distribution='t')
print("Monte Carlo Results")
print(f"Normal 95% VaR: {-np.percentile(sim_normal, 5)*100:.2f}%")
print(f"Normal 99% VaR: {-np.percentile(sim_normal, 1)*100:.2f}%")
print(f"Student-t 95% VaR: {-np.percentile(sim_t, 5)*100:.2f}%")
print(f"Student-t 99% VaR: {-np.percentile(sim_t, 1)*100:.2f}%")
Exercise 7.5: VaR Backtesting (Open-ended)
Your Task:
Build a VaR backtesting function that: - Calculates rolling VaR over a specified window - Counts how many times actual losses exceeded VaR (violations) - Compares violation rate to expected rate - Returns detailed statistics and violation dates
Your implementation:
Click to reveal solution
def backtest_var(returns: pd.Series,
window: int = 252,
confidence: float = 0.95,
method: str = 'historical') -> dict:
"""
Backtest VaR model by comparing predictions to actual losses.
Args:
returns: Return series
window: Rolling window for VaR calculation
confidence: Confidence level
method: 'historical' or 'parametric'
Returns:
Dictionary with backtest results
"""
# Calculate rolling VaR
if method == 'historical':
rolling_var = returns.rolling(window).apply(
lambda x: -np.percentile(x, (1-confidence)*100)
)
else: # parametric
def calc_param_var(x):
mu, sigma = x.mean(), x.std()
z = stats.norm.ppf(1 - confidence)
return -(mu + z * sigma)
rolling_var = returns.rolling(window).apply(calc_param_var)
rolling_var = rolling_var.dropna()
aligned_returns = returns.loc[rolling_var.index]
# Find violations (actual loss > VaR)
violations = aligned_returns < -rolling_var
violation_dates = violations[violations].index
# Statistics
total_days = len(rolling_var)
violation_count = violations.sum()
expected_rate = 1 - confidence
actual_rate = violation_count / total_days
# Kupiec test (binomial test)
p_value = stats.binom_test(violation_count, total_days, expected_rate,
alternative='two-sided')
return {
'total_days': total_days,
'violations': violation_count,
'expected_violations': int(total_days * expected_rate),
'expected_rate': expected_rate,
'actual_rate': actual_rate,
'p_value': p_value,
'model_valid': p_value > 0.05,
'violation_dates': violation_dates,
'rolling_var': rolling_var
}
# Run backtest
results = backtest_var(portfolio_returns, window=252, confidence=0.95)
print("VaR Backtest Results")
print("=" * 50)
print(f"Total days tested: {results['total_days']}")
print(f"Expected violations (5%): {results['expected_violations']}")
print(f"Actual violations: {results['violations']}")
print(f"Actual rate: {results['actual_rate']*100:.2f}%")
print(f"P-value (Kupiec test): {results['p_value']:.4f}")
print(f"Model valid (p > 0.05): {results['model_valid']}")
Exercise 7.6: Complete VaR Analysis System (Open-ended)
Your Task:
Build a comprehensive VaR class that includes: - Multiple calculation methods (parametric, historical, Monte Carlo) - Time scaling to different horizons - Dollar VaR conversion for a given portfolio value - Backtesting capability - Summary report method
Your implementation:
Click to reveal solution
class VaRAnalyzer:
"""
Comprehensive Value at Risk analyzer.
Supports multiple calculation methods, time scaling,
and backtesting capabilities.
"""
def __init__(self, returns: pd.Series, portfolio_value: float = 1_000_000):
self.returns = returns
self.portfolio_value = portfolio_value
self.mu = returns.mean()
self.sigma = returns.std()
self.results = {}
def parametric_normal(self, confidence: float = 0.95) -> float:
z = stats.norm.ppf(1 - confidence)
var = -(self.mu + z * self.sigma)
self.results[f'normal_{int(confidence*100)}'] = var
return var
def parametric_t(self, confidence: float = 0.95) -> float:
params = stats.t.fit(self.returns)
var = -stats.t.ppf(1 - confidence, *params)
self.results[f't_{int(confidence*100)}'] = var
return var
def historical(self, confidence: float = 0.95) -> float:
var = -np.percentile(self.returns, (1 - confidence) * 100)
self.results[f'hist_{int(confidence*100)}'] = var
return var
def monte_carlo(self, confidence: float = 0.95,
n_sims: int = 10000) -> float:
np.random.seed(42)
sims = np.random.normal(self.mu, self.sigma, n_sims)
var = -np.percentile(sims, (1 - confidence) * 100)
self.results[f'mc_{int(confidence*100)}'] = var
return var
def scale_var(self, daily_var: float, horizon: int) -> float:
"""Scale daily VaR to different time horizon."""
return daily_var * np.sqrt(horizon)
def dollar_var(self, var_pct: float) -> float:
"""Convert percentage VaR to dollar amount."""
return self.portfolio_value * var_pct
def calculate_all(self, confidence: float = 0.95) -> dict:
return {
'normal': self.parametric_normal(confidence),
't': self.parametric_t(confidence),
'historical': self.historical(confidence),
'monte_carlo': self.monte_carlo(confidence)
}
def backtest(self, window: int = 252, confidence: float = 0.95) -> dict:
rolling_var = self.returns.rolling(window).apply(
lambda x: -np.percentile(x, (1-confidence)*100)
).dropna()
aligned = self.returns.loc[rolling_var.index]
violations = (aligned < -rolling_var).sum()
expected = len(rolling_var) * (1 - confidence)
return {
'violations': violations,
'expected': int(expected),
'rate': violations / len(rolling_var)
}
def summary(self):
print("\n" + "=" * 60)
print("VaR ANALYSIS SUMMARY")
print("=" * 60)
print(f"Portfolio Value: ${self.portfolio_value:,.0f}")
for conf in [0.95, 0.99]:
results = self.calculate_all(conf)
print(f"\n{int(conf*100)}% Confidence Level:")
print("-" * 40)
for method, var in results.items():
dollar = self.dollar_var(var)
print(f" {method:<15}: {var*100:>6.2f}% (${dollar:>12,.0f})")
bt = self.backtest()
print(f"\nBacktest Results:")
print(f" Violations: {bt['violations']} (expected: {bt['expected']})")
print(f" Rate: {bt['rate']*100:.2f}%")
# Test
analyzer = VaRAnalyzer(portfolio_returns, portfolio_value=1_000_000)
analyzer.summary()
Module Project: Production VaR Risk System
Build a comprehensive VaR calculation and monitoring system suitable for production use.
# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionVaRSystem:
"""
Production-ready Value at Risk system.
Features:
- Multiple VaR calculation methods
- Portfolio-level and asset-level analysis
- Backtesting and violation tracking
- Time horizon scaling
- Comprehensive reporting
"""
def __init__(self, returns: pd.DataFrame,
weights: np.ndarray = None,
portfolio_value: float = 1_000_000):
"""
Initialize VaR system.
Args:
returns: DataFrame of asset returns
weights: Portfolio weights (equal if None)
portfolio_value: Dollar value of portfolio
"""
self.returns = returns
self.assets = list(returns.columns)
self.weights = weights if weights is not None else \
np.ones(len(self.assets)) / len(self.assets)
self.portfolio_value = portfolio_value
# Calculate portfolio returns
self.portfolio_returns = returns.dot(self.weights)
# Store results
self.var_results = {}
def parametric_var(self, confidence: float = 0.95,
distribution: str = 'normal') -> dict:
"""Calculate parametric VaR."""
ret = self.portfolio_returns
if distribution == 'normal':
mu, sigma = ret.mean(), ret.std()
z = stats.norm.ppf(1 - confidence)
var = -(mu + z * sigma)
elif distribution == 't':
params = stats.t.fit(ret)
var = -stats.t.ppf(1 - confidence, *params)
elif distribution == 'cornish_fisher':
mu, sigma = ret.mean(), ret.std()
skew = stats.skew(ret)
kurt = stats.kurtosis(ret)
z = stats.norm.ppf(1 - confidence)
z_cf = (z + (z**2 - 1) * skew / 6 +
(z**3 - 3*z) * kurt / 24 -
(2*z**3 - 5*z) * skew**2 / 36)
var = -(mu + z_cf * sigma)
self.var_results[f'parametric_{distribution}_{int(confidence*100)}'] = var
return {'var': var, 'dollar_var': var * self.portfolio_value}
def historical_var(self, confidence: float = 0.95,
weighted: bool = False,
decay: float = 0.97) -> dict:
"""Calculate historical VaR."""
ret = self.portfolio_returns
if not weighted:
var = -np.percentile(ret, (1 - confidence) * 100)
else:
n = len(ret)
weights = np.array([decay ** i for i in range(n-1, -1, -1)])
weights = weights / weights.sum()
sorted_idx = np.argsort(ret)
sorted_returns = ret.values[sorted_idx]
sorted_weights = weights[sorted_idx]
cumulative = np.cumsum(sorted_weights)
var_idx = np.searchsorted(cumulative, 1 - confidence)
var = -sorted_returns[var_idx]
method = 'historical_weighted' if weighted else 'historical'
self.var_results[f'{method}_{int(confidence*100)}'] = var
return {'var': var, 'dollar_var': var * self.portfolio_value}
def monte_carlo_var(self, confidence: float = 0.95,
n_sims: int = 10000,
multivariate: bool = False) -> dict:
"""Calculate Monte Carlo VaR."""
np.random.seed(42)
if not multivariate:
mu = self.portfolio_returns.mean()
sigma = self.portfolio_returns.std()
sims = np.random.normal(mu, sigma, n_sims)
else:
mean_returns = self.returns.mean().values
cov_matrix = self.returns.cov().values
asset_sims = np.random.multivariate_normal(
mean_returns, cov_matrix, n_sims
)
sims = asset_sims @ self.weights
var = -np.percentile(sims, (1 - confidence) * 100)
method = 'mc_multivariate' if multivariate else 'mc_univariate'
self.var_results[f'{method}_{int(confidence*100)}'] = var
return {'var': var, 'dollar_var': var * self.portfolio_value}
def scale_to_horizon(self, daily_var: float, horizon: int) -> float:
"""Scale daily VaR to different time horizon."""
return daily_var * np.sqrt(horizon)
def calculate_all(self, confidence: float = 0.95) -> pd.DataFrame:
"""Calculate VaR using all methods."""
results = []
# Parametric methods
for dist in ['normal', 't', 'cornish_fisher']:
res = self.parametric_var(confidence, dist)
results.append({
'Method': f'Parametric ({dist})',
'VaR (%)': res['var'] * 100,
'Dollar VaR': res['dollar_var']
})
# Historical methods
for weighted in [False, True]:
res = self.historical_var(confidence, weighted)
name = 'Historical (weighted)' if weighted else 'Historical'
results.append({
'Method': name,
'VaR (%)': res['var'] * 100,
'Dollar VaR': res['dollar_var']
})
# Monte Carlo methods
for multi in [False, True]:
res = self.monte_carlo_var(confidence, multivariate=multi)
name = 'Monte Carlo (multi)' if multi else 'Monte Carlo'
results.append({
'Method': name,
'VaR (%)': res['var'] * 100,
'Dollar VaR': res['dollar_var']
})
return pd.DataFrame(results)
def backtest(self, window: int = 252, confidence: float = 0.95) -> dict:
"""Backtest VaR model."""
rolling_var = self.portfolio_returns.rolling(window).apply(
lambda x: -np.percentile(x, (1 - confidence) * 100)
).dropna()
aligned = self.portfolio_returns.loc[rolling_var.index]
violations = aligned < -rolling_var
violation_count = violations.sum()
total_days = len(rolling_var)
expected = total_days * (1 - confidence)
return {
'total_days': total_days,
'violations': violation_count,
'expected': int(expected),
'rate': violation_count / total_days,
'expected_rate': 1 - confidence,
'pass': abs(violation_count - expected) < 2 * np.sqrt(expected)
}
def report(self):
"""Generate comprehensive VaR report."""
print("\n" + "=" * 70)
print("PRODUCTION VaR RISK REPORT")
print("=" * 70)
print(f"\nPortfolio Value: ${self.portfolio_value:,.0f}")
print(f"Assets: {self.assets}")
print(f"Weights: {dict(zip(self.assets, self.weights))}")
# VaR at multiple confidence levels
for conf in [0.95, 0.99]:
print(f"\n{'='*70}")
print(f"{int(conf*100)}% VALUE AT RISK")
print("=" * 70)
df = self.calculate_all(conf)
df['Dollar VaR'] = df['Dollar VaR'].apply(lambda x: f"${x:,.0f}")
df['VaR (%)'] = df['VaR (%)'].apply(lambda x: f"{x:.2f}%")
print(df.to_string(index=False))
# Time horizon scaling
print(f"\n{'='*70}")
print("TIME HORIZON SCALING (95% Historical VaR)")
print("=" * 70)
base_var = -np.percentile(self.portfolio_returns, 5)
horizons = {'1 Day': 1, '1 Week': 5, '2 Weeks': 10,
'1 Month': 21, '1 Quarter': 63}
for name, days in horizons.items():
scaled = self.scale_to_horizon(base_var, days)
dollar = scaled * self.portfolio_value
print(f" {name:<12}: {scaled*100:>6.2f}% (${dollar:>12,.0f})")
# Backtest
print(f"\n{'='*70}")
print("BACKTEST RESULTS")
print("=" * 70)
bt = self.backtest()
print(f" Total days tested: {bt['total_days']}")
print(f" Expected violations (5%): {bt['expected']}")
print(f" Actual violations: {bt['violations']}")
print(f" Violation rate: {bt['rate']*100:.2f}%")
print(f" Model status: {'PASS' if bt['pass'] else 'FAIL'}")
# Test the production system
system = ProductionVaRSystem(
returns=returns,
weights=portfolio_weights,
portfolio_value=1_000_000
)
system.report()
Key Takeaways
What You Learned
1. VaR Fundamentals
- VaR measures maximum expected loss at a given confidence level
- Common parameters: 95%/99% confidence, 1-day horizon
- Square root of time rule for scaling horizons
2. Parametric VaR
- Normal: $VaR = \mu - z_\alpha \sigma$
- Student-t captures fat tails
- Cornish-Fisher adjusts for skewness and kurtosis
3. Historical VaR
- No distributional assumptions required
- Age-weighted for recency bias
- Rolling VaR for time-varying risk
4. Monte Carlo VaR
- Flexible for complex portfolios
- Can incorporate correlations and fat tails
- Multivariate simulation for portfolio risk
VaR Limitations
- Doesn't tell you how bad losses can be beyond VaR
- Not sub-additive (can violate diversification)
- Assumes historical patterns continue
Coming Up Next
In Module 8: Beyond VaR, we'll explore: - Expected Shortfall (CVaR) for tail risk - Stress testing with historical and hypothetical scenarios - Drawdown analysis and duration risk - Advanced tail risk measures
Congratulations on completing Module 7!
Module 8: Beyond VaR
Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling
Learning Objectives
By the end of this module, you will be able to:
- Calculate Expected Shortfall (CVaR) and understand its advantages over VaR
- Design and implement historical and hypothetical stress tests
- Analyze drawdowns and duration risk
- Apply advanced tail risk measures including Omega and Sortino ratios
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 7: Value at Risk |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download portfolio data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2006-01-01', end='2024-01-01', progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.iloc[:, :len(tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()
# Portfolio weights
weights = np.array([0.40, 0.20, 0.25, 0.15])
portfolio_returns = returns.dot(weights)
print(f"Data loaded: {returns.index[0].strftime('%Y-%m-%d')} to {returns.index[-1].strftime('%Y-%m-%d')}")
print(f"Assets: {list(returns.columns)}")
print(f"Total observations: {len(returns)}")
Section 8.1: Expected Shortfall (CVaR)
VaR tells you the threshold loss at a given confidence level, but it says nothing about how bad things can get when you exceed that threshold. Expected Shortfall (ES), also called Conditional VaR (CVaR), addresses this limitation.
In this section, you will learn: - The definition and interpretation of Expected Shortfall - Why regulators prefer ES over VaR - Multiple calculation methods for ES
8.1.1 Expected Shortfall Definition
Expected Shortfall at confidence level $\alpha$ measures the expected loss given that we've exceeded VaR:
$$ES_{\alpha} = E[L | L > VaR_{\alpha}]$$
In plain English: "When bad days happen, how bad are they on average?"
def calculate_var_es(returns: pd.Series, confidence: float = 0.95) -> tuple:
"""
Calculate VaR and Expected Shortfall using historical simulation.
Args:
returns: Historical returns
confidence: Confidence level (e.g., 0.95 for 95%)
Returns:
Tuple of (VaR, ES) both as positive numbers representing losses
"""
returns_arr = np.array(returns)
alpha = 1 - confidence
# VaR is the alpha quantile of returns
var = -np.percentile(returns_arr, alpha * 100)
# ES is the average of returns worse than VaR
threshold = np.percentile(returns_arr, alpha * 100)
tail_returns = returns_arr[returns_arr <= threshold]
es = -np.mean(tail_returns)
return var, es
# Calculate for SPY
spy_returns = returns['SPY']
var_95, es_95 = calculate_var_es(spy_returns, 0.95)
var_99, es_99 = calculate_var_es(spy_returns, 0.99)
print("SPY Risk Measures (Historical)")
print("=" * 40)
print(f"\n95% Confidence:")
print(f" VaR: {var_95*100:.2f}%")
print(f" Expected Shortfall: {es_95*100:.2f}%")
print(f" ES/VaR Ratio: {es_95/var_95:.2f}x")
print(f"\n99% Confidence:")
print(f" VaR: {var_99*100:.2f}%")
print(f" Expected Shortfall: {es_99*100:.2f}%")
print(f" ES/VaR Ratio: {es_99/var_99:.2f}x")
# Visualize the difference between VaR and ES
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left plot: Distribution with VaR and ES
ax1 = axes[0]
var_threshold = np.percentile(spy_returns, 5)
n, bins, patches = ax1.hist(spy_returns, bins=100, density=True, alpha=0.7,
color='steelblue', edgecolor='white')
# Color the tail
for i, (patch, left_edge) in enumerate(zip(patches, bins[:-1])):
if left_edge < var_threshold:
patch.set_facecolor('crimson')
ax1.axvline(-var_95, color='orange', linewidth=2, linestyle='--',
label=f'95% VaR: {var_95*100:.2f}%')
ax1.axvline(-es_95, color='darkred', linewidth=2, linestyle='-',
label=f'95% ES: {es_95*100:.2f}%')
ax1.set_xlabel('Daily Return')
ax1.set_ylabel('Density')
ax1.set_title('VaR vs Expected Shortfall\n(Red area = Tail losses beyond VaR)')
ax1.legend()
ax1.set_xlim(-0.12, 0.12)
# Right plot: Tail losses only
ax2 = axes[1]
tail_losses = -spy_returns[spy_returns < var_threshold]
ax2.hist(tail_losses * 100, bins=30, alpha=0.7, color='crimson', edgecolor='white')
ax2.axvline(var_95 * 100, color='orange', linewidth=2, linestyle='--',
label=f'VaR: {var_95*100:.2f}%')
ax2.axvline(es_95 * 100, color='darkred', linewidth=2, linestyle='-',
label=f'ES (avg): {es_95*100:.2f}%')
ax2.set_xlabel('Loss (%)')
ax2.set_ylabel('Frequency')
ax2.set_title(f'Distribution of Tail Losses\n({len(tail_losses)} observations beyond VaR)')
ax2.legend()
plt.tight_layout()
plt.show()
print(f"\nKey Insight: When losses exceed VaR ({var_95*100:.2f}%), they average {es_95*100:.2f}%")
8.1.2 Why Regulators Prefer Expected Shortfall
The Basel Committee shifted from VaR to ES for several reasons:
| Property | VaR | Expected Shortfall |
|---|---|---|
| Captures tail risk | No | Yes |
| Coherent risk measure | No | Yes |
| Sub-additive | No | Yes |
| Penalizes concentration | No | Yes |
Sub-additivity is crucial: $Risk(A + B) \leq Risk(A) + Risk(B)$
# Demonstrate sub-additivity with portfolio diversification
print("Sub-Additivity Test: Diversification Benefits")
print("=" * 50)
# Individual asset risk measures
results = []
for asset in returns.columns:
var, es = calculate_var_es(returns[asset], 0.95)
results.append({'Asset': asset, 'VaR_95': var, 'ES_95': es})
print(f"{asset}: VaR = {var*100:.2f}%, ES = {es*100:.2f}%")
# Equal-weighted portfolio
port_returns = returns.mean(axis=1)
port_var, port_es = calculate_var_es(port_returns, 0.95)
# Average of individual risks
df_results = pd.DataFrame(results)
avg_var = df_results['VaR_95'].mean()
avg_es = df_results['ES_95'].mean()
print(f"\n{'='*50}")
print(f"Portfolio (equal-weight):")
print(f" VaR: {port_var*100:.2f}% (vs avg individual: {avg_var*100:.2f}%)")
print(f" ES: {port_es*100:.2f}% (vs avg individual: {avg_es*100:.2f}%)")
print(f"\nDiversification Benefit:")
print(f" VaR reduction: {(1 - port_var/avg_var)*100:.1f}%")
print(f" ES reduction: {(1 - port_es/avg_es)*100:.1f}%")
8.1.3 Parametric Expected Shortfall
For normally distributed returns, ES has a closed-form solution:
$$ES_{\alpha} = \mu + \sigma \cdot \frac{\phi(z_{\alpha})}{1-\alpha}$$
def parametric_es(returns: pd.Series,
confidence: float = 0.95,
distribution: str = 'normal') -> tuple:
"""
Calculate Expected Shortfall using parametric methods.
Args:
returns: Historical returns for parameter estimation
confidence: Confidence level
distribution: 'normal' or 't' for Student-t
Returns:
Tuple of (VaR, ES)
"""
returns_arr = np.array(returns)
mu = np.mean(returns_arr)
sigma = np.std(returns_arr)
alpha = 1 - confidence
if distribution == 'normal':
z_alpha = stats.norm.ppf(alpha)
var = -(mu + sigma * z_alpha)
es = -mu + sigma * stats.norm.pdf(z_alpha) / alpha
else: # Student-t
df, loc, scale = stats.t.fit(returns_arr)
t_alpha = stats.t.ppf(alpha, df)
var = -(loc + scale * t_alpha)
es = -loc + scale * (stats.t.pdf(t_alpha, df) / alpha) * (df + t_alpha**2) / (df - 1)
return var, es
# Compare methods
print("Expected Shortfall Comparison (SPY, 95% confidence)")
print("=" * 55)
var_hist, es_hist = calculate_var_es(spy_returns, 0.95)
var_norm, es_norm = parametric_es(spy_returns, 0.95, 'normal')
var_t, es_t = parametric_es(spy_returns, 0.95, 't')
comparison = pd.DataFrame({
'Method': ['Historical', 'Normal', 'Student-t'],
'VaR (%)': [var_hist*100, var_norm*100, var_t*100],
'ES (%)': [es_hist*100, es_norm*100, es_t*100],
'ES/VaR': [es_hist/var_hist, es_norm/var_norm, es_t/var_t]
})
print(comparison.to_string(index=False))
Exercise 8.1: Multi-Asset Expected Shortfall (Guided)
Your Task: Calculate and compare VaR and ES for all assets in the dataset. Find which asset has the highest ES/VaR ratio (fattest tail).
Fill in the blanks to complete the function:
Click to reveal solution
def analyze_tail_risk(returns: pd.DataFrame, confidence: float = 0.95) -> pd.DataFrame:
results = []
for asset in returns.columns:
asset_returns = returns[asset]
alpha = 1 - confidence
var = -np.percentile(asset_returns, alpha * 100)
threshold = np.percentile(asset_returns, alpha * 100)
tail_returns = asset_returns[asset_returns <= threshold]
es = -tail_returns.mean()
results.append({
'Asset': asset,
'VaR': var,
'ES': es,
'ES_VaR_Ratio': es / var
})
return pd.DataFrame(results).sort_values('ES_VaR_Ratio', ascending=False)
risk_analysis = analyze_tail_risk(returns, 0.95)
print("Tail Risk Analysis (95% Confidence)")
print("=" * 50)
for _, row in risk_analysis.iterrows():
print(f"{row['Asset']}: VaR={row['VaR']*100:.2f}%, ES={row['ES']*100:.2f}%, Ratio={row['ES_VaR_Ratio']:.2f}")
print(f"\nFattest tail: {risk_analysis.iloc[0]['Asset']}")
Section 8.2: Stress Testing
VaR and ES estimate risk based on historical patterns. But what about unprecedented events? Stress testing evaluates portfolio performance under extreme scenarios.
In this section, you will learn: - Historical scenario analysis using past crises - Hypothetical stress testing for unprecedented events - Sensitivity analysis for factor changes
8.2.1 Historical Scenario Analysis
# Define historical crisis periods
crisis_periods = {
'Global Financial Crisis': ('2008-09-01', '2008-11-30'),
'Flash Crash 2010': ('2010-05-01', '2010-05-31'),
'Euro Debt Crisis': ('2011-08-01', '2011-10-31'),
'China Deval 2015': ('2015-08-01', '2015-09-30'),
'COVID Crash': ('2020-02-15', '2020-03-31'),
'Rate Hike 2022': ('2022-01-01', '2022-06-30')
}
def analyze_crisis_period(returns_df: pd.DataFrame,
start_date: str,
end_date: str) -> dict:
"""Calculate risk metrics during a crisis period."""
crisis_returns = returns_df.loc[start_date:end_date]
if len(crisis_returns) == 0:
return None
cumulative = (1 + crisis_returns).prod() - 1
worst_day = crisis_returns.min()
vol = crisis_returns.std() * np.sqrt(252)
return {
'days': len(crisis_returns),
'cumulative': cumulative,
'worst_day': worst_day,
'annualized_vol': vol
}
# Analyze each crisis for SPY
print("Historical Crisis Analysis: SPY")
print("=" * 70)
crisis_results = []
for crisis_name, (start, end) in crisis_periods.items():
result = analyze_crisis_period(returns['SPY'], start, end)
if result:
crisis_results.append({
'Crisis': crisis_name,
'Days': result['days'],
'Total Return': f"{result['cumulative']*100:.1f}%",
'Worst Day': f"{result['worst_day']*100:.1f}%",
'Ann. Vol': f"{result['annualized_vol']*100:.0f}%"
})
df_crisis = pd.DataFrame(crisis_results)
print(df_crisis.to_string(index=False))
# Portfolio stress test across multiple allocations
def stress_test_portfolio(weights: dict,
returns_df: pd.DataFrame,
crisis_periods: dict) -> pd.DataFrame:
"""
Apply historical crisis scenarios to a portfolio.
Args:
weights: Asset weights dict
returns_df: DataFrame of historical returns
crisis_periods: Dict of crisis name -> (start, end)
Returns:
DataFrame with stress test results
"""
results = []
assets = [a for a in weights.keys() if a in returns_df.columns]
w = np.array([weights[a] for a in assets])
w = w / w.sum()
port_returns = pd.Series(returns_df[assets].values @ w, index=returns_df.index)
for crisis_name, (start, end) in crisis_periods.items():
crisis_ret = port_returns.loc[start:end]
if len(crisis_ret) > 0:
total_return = (1 + crisis_ret).prod() - 1
max_dd = (crisis_ret.cumsum() - crisis_ret.cumsum().cummax()).min()
worst_day = crisis_ret.min()
results.append({
'Scenario': crisis_name,
'Portfolio Return': total_return,
'Worst Day': worst_day,
'Max Drawdown': max_dd
})
return pd.DataFrame(results)
# Define test portfolios
portfolios = {
'Aggressive': {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0},
'Balanced': {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1},
'Conservative': {'SPY': 0.2, 'QQQ': 0.1, 'TLT': 0.5, 'GLD': 0.2}
}
print("Portfolio Stress Test Results")
print("=" * 70)
for port_name, port_weights in portfolios.items():
print(f"\n{port_name} Portfolio:")
results = stress_test_portfolio(port_weights, returns, crisis_periods)
results['Portfolio Return'] = results['Portfolio Return'].apply(lambda x: f"{x*100:.1f}%")
results['Worst Day'] = results['Worst Day'].apply(lambda x: f"{x*100:.1f}%")
results['Max Drawdown'] = results['Max Drawdown'].apply(lambda x: f"{x*100:.1f}%")
print(results.to_string(index=False))
8.2.2 Hypothetical Stress Testing
def hypothetical_stress_test(weights: dict, scenarios: dict) -> pd.DataFrame:
"""
Apply hypothetical scenarios to a portfolio.
Args:
weights: Asset -> weight
scenarios: Scenario name -> {asset: return}
Returns:
DataFrame with scenario impacts
"""
results = []
for scenario_name, asset_returns in scenarios.items():
port_return = sum(weights.get(a, 0) * asset_returns.get(a, 0)
for a in weights.keys())
results.append({
'Scenario': scenario_name,
'Portfolio Impact': port_return
})
return pd.DataFrame(results)
# Define hypothetical scenarios
hypothetical_scenarios = {
'Market Crash (-20%)': {'SPY': -0.20, 'QQQ': -0.25, 'TLT': 0.05, 'GLD': 0.03},
'Tech Bubble Burst': {'SPY': -0.15, 'QQQ': -0.35, 'TLT': 0.08, 'GLD': 0.05},
'Stagflation': {'SPY': -0.12, 'QQQ': -0.18, 'TLT': -0.15, 'GLD': 0.25},
'Flash Crash (-10% day)': {'SPY': -0.10, 'QQQ': -0.12, 'TLT': 0.02, 'GLD': 0.01},
'Bond Market Crisis': {'SPY': -0.05, 'QQQ': -0.05, 'TLT': -0.25, 'GLD': 0.10},
'Dollar Collapse': {'SPY': -0.08, 'QQQ': -0.08, 'TLT': -0.10, 'GLD': 0.35},
}
print("Hypothetical Stress Test Results")
print("=" * 60)
for port_name, port_weights in portfolios.items():
print(f"\n{port_name} Portfolio:")
results = hypothetical_stress_test(port_weights, hypothetical_scenarios)
results = results.sort_values('Portfolio Impact')
results['Portfolio Impact'] = results['Portfolio Impact'].apply(lambda x: f"{x*100:+.1f}%")
print(results.to_string(index=False))
Exercise 8.2: Custom Stress Scenario (Guided)
Your Task: Design and test a geopolitical crisis scenario that affects all assets.
Fill in the blanks to complete the implementation:
Click to reveal solution
def create_geopolitical_scenarios() -> dict:
scenarios = {
'Mild Tension': {
'SPY': -0.05,
'QQQ': -0.07,
'TLT': 0.03,
'GLD': 0.08
},
'Major Crisis': {
'SPY': -0.15,
'QQQ': -0.20,
'TLT': 0.10,
'GLD': 0.20
},
'Severe Conflict': {
'SPY': -0.25,
'QQQ': -0.30,
'TLT': 0.08,
'GLD': 0.35
}
}
return scenarios
# Test
geo_scenarios = create_geopolitical_scenarios()
print("Geopolitical Crisis Stress Test")
print("=" * 50)
for port_name, port_weights in portfolios.items():
results = hypothetical_stress_test(port_weights, geo_scenarios)
worst = results['Portfolio Impact'].min()
best = results['Portfolio Impact'].max()
print(f"{port_name}: Best {best*100:+.1f}%, Worst {worst*100:.1f}%")
Section 8.3: Drawdown Analysis
VaR measures single-period risk. But investors often care more about cumulative losses over time—how far can the portfolio fall, and how long until recovery?
In this section, you will learn: - Maximum drawdown calculation - Drawdown duration analysis - Conditional Drawdown at Risk (CDaR)
def calculate_drawdowns(returns: pd.Series) -> dict:
"""
Calculate drawdown series and statistics.
Args:
returns: Daily returns
Returns:
Dictionary with drawdown metrics and series
"""
cum_returns = (1 + returns).cumprod()
running_max = cum_returns.cummax()
drawdown = (cum_returns - running_max) / running_max
max_dd = drawdown.min()
max_dd_end = drawdown.idxmin()
peak_idx = cum_returns.loc[:max_dd_end].idxmax()
# Find recovery
peak_value = cum_returns.loc[peak_idx]
after_trough = cum_returns.loc[max_dd_end:]
recovery = after_trough[after_trough >= peak_value]
recovery_date = recovery.index[0] if len(recovery) > 0 else None
if recovery_date:
duration = (recovery_date - peak_idx).days
else:
duration = (returns.index[-1] - peak_idx).days
return {
'max_drawdown': max_dd,
'peak_date': peak_idx,
'trough_date': max_dd_end,
'recovery_date': recovery_date,
'duration_days': duration,
'drawdown_series': drawdown,
'wealth_curve': cum_returns
}
# Calculate for SPY
spy_dd = calculate_drawdowns(returns['SPY'])
print("SPY Maximum Drawdown Analysis")
print("=" * 50)
print(f"Maximum Drawdown: {spy_dd['max_drawdown']*100:.1f}%")
print(f"Peak Date: {spy_dd['peak_date'].strftime('%Y-%m-%d')}")
print(f"Trough Date: {spy_dd['trough_date'].strftime('%Y-%m-%d')}")
if spy_dd['recovery_date']:
print(f"Recovery Date: {spy_dd['recovery_date'].strftime('%Y-%m-%d')}")
else:
print("Recovery Date: Not yet recovered")
print(f"Duration: {spy_dd['duration_days']} days ({spy_dd['duration_days']/365:.1f} years)")
# Visualize drawdowns
fig, axes = plt.subplots(2, 1, figsize=(14, 8))
# Top: Wealth curve
ax1 = axes[0]
ax1.plot(spy_dd['wealth_curve'], label='SPY Cumulative Return', linewidth=1.5)
ax1.fill_between(spy_dd['wealth_curve'].index,
spy_dd['wealth_curve'].cummax(),
spy_dd['wealth_curve'],
alpha=0.3, color='red', label='Drawdown Area')
ax1.set_ylabel('Cumulative Return (Growth of $1)')
ax1.set_title('SPY Wealth Curve with Drawdown Periods')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
# Bottom: Drawdown series
ax2 = axes[1]
ax2.fill_between(spy_dd['drawdown_series'].index,
spy_dd['drawdown_series'] * 100,
0, alpha=0.7, color='crimson')
ax2.axhline(spy_dd['max_drawdown'] * 100, color='darkred', linestyle='--',
label=f'Max DD: {spy_dd["max_drawdown"]*100:.1f}%')
ax2.set_ylabel('Drawdown (%)')
ax2.set_xlabel('Date')
ax2.set_title('SPY Drawdown History')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def calculate_cdar(returns: pd.Series, confidence: float = 0.95) -> tuple:
"""
Calculate Conditional Drawdown at Risk.
Args:
returns: Daily returns
confidence: Confidence level
Returns:
Tuple of (DaR, CDaR)
"""
cum_returns = (1 + returns).cumprod()
running_max = cum_returns.cummax()
drawdowns = (cum_returns - running_max) / running_max
alpha = 1 - confidence
dar = -np.percentile(drawdowns, alpha * 100)
threshold = np.percentile(drawdowns, alpha * 100)
worst_drawdowns = drawdowns[drawdowns <= threshold]
cdar = -np.mean(worst_drawdowns)
return dar, cdar
# Calculate for all assets
print("Drawdown Risk Metrics Comparison")
print("=" * 60)
dd_metrics = []
for asset in returns.columns:
dd_info = calculate_drawdowns(returns[asset])
dar_95, cdar_95 = calculate_cdar(returns[asset], 0.95)
dd_metrics.append({
'Asset': asset,
'Max DD': f"{dd_info['max_drawdown']*100:.1f}%",
'DaR (95%)': f"{dar_95*100:.1f}%",
'CDaR (95%)': f"{cdar_95*100:.1f}%"
})
df_dd = pd.DataFrame(dd_metrics)
print(df_dd.to_string(index=False))
Exercise 8.3: Portfolio Drawdown Comparison (Open-ended)
Your Task:
Build a function that compares drawdown characteristics across multiple portfolios: - Calculate max drawdown, DaR, and CDaR for each portfolio - Create a visualization showing drawdown series for all portfolios - Rank portfolios by drawdown profile
Your implementation:
Click to reveal solution
def compare_portfolio_drawdowns(portfolios: dict,
returns_df: pd.DataFrame) -> pd.DataFrame:
"""
Compare drawdown metrics across portfolios.
Args:
portfolios: Dict of portfolio name -> weights dict
returns_df: DataFrame of asset returns
Returns:
DataFrame with drawdown metrics for each portfolio
"""
results = []
dd_series_dict = {}
for port_name, weights in portfolios.items():
# Calculate portfolio returns
assets = [a for a in weights.keys() if a in returns_df.columns]
w = np.array([weights[a] for a in assets])
w = w / w.sum()
port_ret = pd.Series(returns_df[assets].values @ w, index=returns_df.index)
# Calculate metrics
dd_info = calculate_drawdowns(port_ret)
dar, cdar = calculate_cdar(port_ret, 0.95)
results.append({
'Portfolio': port_name,
'Max DD': dd_info['max_drawdown'] * 100,
'DaR 95%': dar * 100,
'CDaR 95%': cdar * 100,
'Duration': dd_info['duration_days']
})
dd_series_dict[port_name] = dd_info['drawdown_series']
# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))
colors = {'Aggressive': 'crimson', 'Balanced': 'steelblue', 'Conservative': 'green'}
for port_name, dd_series in dd_series_dict.items():
ax.plot(dd_series * 100, label=port_name, linewidth=1.5,
color=colors.get(port_name, 'gray'))
ax.axhline(0, color='black', linestyle='-', linewidth=0.5)
ax.set_ylabel('Drawdown (%)')
ax.set_xlabel('Date')
ax.set_title('Portfolio Drawdown Comparison')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return pd.DataFrame(results).sort_values('Max DD')
# Test
dd_comparison = compare_portfolio_drawdowns(portfolios, returns)
print("Portfolio Drawdown Comparison")
print("=" * 60)
print(dd_comparison.to_string(index=False))
Section 8.4: Tail Risk Measures
Beyond VaR and ES, several specialized metrics help quantify tail risk and return asymmetry.
In this section, you will learn: - Tail ratio and gain/loss metrics - Sortino ratio (downside deviation only) - Omega ratio (full distribution)
def tail_ratio(returns: pd.Series, percentile: int = 5) -> float:
"""
Calculate the tail ratio.
Tail Ratio = 95th percentile / |5th percentile|
> 1.0 means positive skew (gains > losses in tails)
"""
returns_arr = np.array(returns)
right_tail = np.percentile(returns_arr, 100 - percentile)
left_tail = np.percentile(returns_arr, percentile)
return right_tail / abs(left_tail) if left_tail != 0 else np.inf
def sortino_ratio(returns: pd.Series, risk_free_rate: float = 0) -> float:
"""
Calculate Sortino ratio (only penalizes downside volatility).
"""
returns_arr = np.array(returns)
excess = returns_arr - risk_free_rate / 252
downside = returns_arr[returns_arr < 0]
downside_std = np.std(downside) * np.sqrt(252) if len(downside) > 0 else 1e-10
return (np.mean(returns_arr) * 252) / downside_std
def omega_ratio(returns: pd.Series, threshold: float = 0) -> float:
"""
Calculate Omega ratio.
Omega = Probability-weighted gains above threshold /
Probability-weighted losses below threshold
"""
returns_arr = np.array(returns)
gains = returns_arr[returns_arr > threshold] - threshold
losses = threshold - returns_arr[returns_arr <= threshold]
sum_gains = np.sum(gains) if len(gains) > 0 else 0
sum_losses = np.sum(losses) if len(losses) > 0 else 1e-10
return sum_gains / sum_losses
# Calculate for all assets
print("Advanced Risk-Adjusted Metrics")
print("=" * 60)
advanced_metrics = []
for asset in returns.columns:
ret = returns[asset]
sharpe = (ret.mean() * 252) / (ret.std() * np.sqrt(252))
sortino = sortino_ratio(ret)
omega = omega_ratio(ret)
tr = tail_ratio(ret)
dd = calculate_drawdowns(ret)
calmar = (ret.mean() * 252) / abs(dd['max_drawdown'])
advanced_metrics.append({
'Asset': asset,
'Sharpe': sharpe,
'Sortino': sortino,
'Omega': omega,
'Calmar': calmar,
'Tail Ratio': tr
})
df_advanced = pd.DataFrame(advanced_metrics)
print(df_advanced.to_string(index=False, float_format=lambda x: f"{x:.2f}"))
Exercise 8.4: Gain/Loss Analysis (Guided)
Your Task: Calculate gain/loss metrics including win rate, average gain/loss, and profit factor.
Fill in the blanks to complete the function:
Click to reveal solution
def gain_loss_analysis(returns: pd.Series) -> dict:
returns_arr = np.array(returns)
gains = returns_arr[returns_arr > 0]
losses = returns_arr[returns_arr < 0]
win_rate = len(gains) / len(returns_arr)
avg_gain = np.mean(gains) if len(gains) > 0 else 0
avg_loss = np.mean(losses) if len(losses) > 0 else 0
gl_ratio = avg_gain / abs(avg_loss) if avg_loss != 0 else np.inf
profit_factor = np.sum(gains) / abs(np.sum(losses)) if np.sum(losses) != 0 else np.inf
return {
'win_rate': win_rate,
'avg_gain': avg_gain,
'avg_loss': avg_loss,
'gain_loss_ratio': gl_ratio,
'profit_factor': profit_factor
}
# Test for all assets
print("Gain/Loss Analysis")
print("=" * 60)
for asset in returns.columns:
m = gain_loss_analysis(returns[asset])
print(f"{asset}: Win={m['win_rate']*100:.1f}%, G/L={m['gain_loss_ratio']:.2f}, PF={m['profit_factor']:.2f}")
Exercise 8.5: Comprehensive Risk Report (Open-ended)
Your Task:
Build a function that generates a comprehensive risk report including: - Basic statistics (return, volatility, skewness, kurtosis) - VaR and ES at multiple confidence levels - Drawdown metrics - All risk-adjusted ratios (Sharpe, Sortino, Calmar, Omega)
Your implementation:
Click to reveal solution
def comprehensive_risk_report(returns: pd.Series, name: str = "Portfolio") -> dict:
"""
Generate comprehensive risk report.
Args:
returns: Return series
name: Name for display
Returns:
Dictionary with all risk metrics
"""
ret = np.array(returns)
# Basic stats
ann_return = np.mean(ret) * 252
ann_vol = np.std(ret) * np.sqrt(252)
skewness = stats.skew(ret)
kurtosis = stats.kurtosis(ret)
# VaR and ES
var_95, es_95 = calculate_var_es(ret, 0.95)
var_99, es_99 = calculate_var_es(ret, 0.99)
# Drawdown
dd_info = calculate_drawdowns(returns)
dar_95, cdar_95 = calculate_cdar(returns, 0.95)
# Ratios
sharpe = ann_return / ann_vol if ann_vol > 0 else 0
sortino = sortino_ratio(ret)
omega = omega_ratio(ret)
calmar = ann_return / abs(dd_info['max_drawdown']) if dd_info['max_drawdown'] != 0 else 0
tr = tail_ratio(ret)
# Gain/Loss
gl = gain_loss_analysis(ret)
print(f"\n{'='*60}")
print(f"RISK REPORT: {name}")
print(f"{'='*60}")
print(f"\nRETURN STATISTICS")
print(f" Annual Return: {ann_return*100:>8.2f}%")
print(f" Annual Volatility: {ann_vol*100:>8.2f}%")
print(f" Skewness: {skewness:>8.2f}")
print(f" Excess Kurtosis: {kurtosis:>8.2f}")
print(f"\nVALUE AT RISK")
print(f" VaR (95%): {var_95*100:>8.2f}%")
print(f" ES (95%): {es_95*100:>8.2f}%")
print(f" VaR (99%): {var_99*100:>8.2f}%")
print(f" ES (99%): {es_99*100:>8.2f}%")
print(f"\nDRAWDOWN METRICS")
print(f" Max Drawdown: {dd_info['max_drawdown']*100:>8.2f}%")
print(f" DaR (95%): {dar_95*100:>8.2f}%")
print(f" CDaR (95%): {cdar_95*100:>8.2f}%")
print(f"\nRISK-ADJUSTED RATIOS")
print(f" Sharpe: {sharpe:>8.2f}")
print(f" Sortino: {sortino:>8.2f}")
print(f" Omega: {omega:>8.2f}")
print(f" Calmar: {calmar:>8.2f}")
print(f"\nTAIL RISK")
print(f" Tail Ratio: {tr:>8.2f}")
print(f" Win Rate: {gl['win_rate']*100:>8.1f}%")
print(f" Profit Factor: {gl['profit_factor']:>8.2f}")
return {'ann_return': ann_return, 'ann_vol': ann_vol, 'sharpe': sharpe}
# Generate reports
for asset in returns.columns:
comprehensive_risk_report(returns[asset], asset)
Exercise 8.6: Risk Dashboard Class (Open-ended)
Your Task:
Build a comprehensive RiskDashboard class that: - Calculates all risk metrics (VaR, ES, drawdowns, ratios) - Supports stress testing - Generates visualizations - Produces a summary report
Your implementation:
Click to reveal solution
class RiskDashboard:
"""
Comprehensive risk analysis dashboard.
"""
def __init__(self, returns: pd.Series, name: str = "Portfolio"):
self.returns = returns
self.name = name
self._calculate_metrics()
def _calculate_metrics(self):
ret = np.array(self.returns)
# Basic stats
self.ann_return = np.mean(ret) * 252
self.ann_vol = np.std(ret) * np.sqrt(252)
self.skewness = stats.skew(ret)
self.kurtosis = stats.kurtosis(ret)
# VaR/ES
self.var_95, self.es_95 = calculate_var_es(ret, 0.95)
self.var_99, self.es_99 = calculate_var_es(ret, 0.99)
# Drawdown
dd = calculate_drawdowns(self.returns)
self.max_drawdown = dd['max_drawdown']
self.drawdown_series = dd['drawdown_series']
self.dar_95, self.cdar_95 = calculate_cdar(self.returns, 0.95)
# Ratios
self.sharpe = self.ann_return / self.ann_vol if self.ann_vol > 0 else 0
self.sortino = sortino_ratio(ret)
self.omega = omega_ratio(ret)
self.calmar = self.ann_return / abs(self.max_drawdown) if self.max_drawdown != 0 else 0
self.tail_ratio = tail_ratio(ret)
def stress_test(self, scenarios: dict) -> pd.DataFrame:
"""Apply stress scenarios."""
results = []
for name, impact in scenarios.items():
results.append({'Scenario': name, 'Impact': impact})
return pd.DataFrame(results)
def plot_dashboard(self, figsize=(14, 10)):
"""Create visual dashboard."""
fig, axes = plt.subplots(2, 2, figsize=figsize)
fig.suptitle(f'Risk Dashboard: {self.name}', fontsize=14, fontweight='bold')
# Distribution with VaR/ES
ax1 = axes[0, 0]
ax1.hist(self.returns, bins=50, density=True, alpha=0.7, color='steelblue')
ax1.axvline(-self.var_95, color='orange', linestyle='--', linewidth=2)
ax1.axvline(-self.es_95, color='red', linestyle='-', linewidth=2)
ax1.set_xlabel('Daily Return')
ax1.set_title('Return Distribution with Risk Measures')
# Drawdown
ax2 = axes[0, 1]
ax2.fill_between(self.drawdown_series.index, self.drawdown_series * 100, 0,
alpha=0.7, color='crimson')
ax2.set_xlabel('Date')
ax2.set_ylabel('Drawdown (%)')
ax2.set_title('Historical Drawdowns')
# Rolling volatility
ax3 = axes[1, 0]
rolling_vol = self.returns.rolling(21).std() * np.sqrt(252) * 100
ax3.plot(rolling_vol.index, rolling_vol, linewidth=1)
ax3.set_xlabel('Date')
ax3.set_ylabel('Volatility (%)')
ax3.set_title('Rolling 21-day Volatility')
# Metrics summary
ax4 = axes[1, 1]
ax4.axis('off')
summary = f"""
RISK METRICS SUMMARY
{'='*30}
Return & Volatility
Annual Return: {self.ann_return*100:>8.2f}%
Annual Volatility: {self.ann_vol*100:>8.2f}%
Downside Risk
VaR (95%): {self.var_95*100:>8.2f}%
ES (95%): {self.es_95*100:>8.2f}%
Max Drawdown: {self.max_drawdown*100:>8.2f}%
Risk-Adjusted Ratios
Sharpe: {self.sharpe:>8.2f}
Sortino: {self.sortino:>8.2f}
Calmar: {self.calmar:>8.2f}
"""
ax4.text(0.1, 0.95, summary, transform=ax4.transAxes, fontsize=10,
verticalalignment='top', fontfamily='monospace')
plt.tight_layout()
plt.subplots_adjust(top=0.92)
plt.show()
# Test
dashboard = RiskDashboard(portfolio_returns, "Balanced Portfolio")
dashboard.plot_dashboard()
Module Project: Production Risk Management System
Build a comprehensive risk management system that integrates all concepts from this module.
# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionRiskSystem:
"""
Production-ready risk management system.
Features:
- VaR and Expected Shortfall calculation
- Stress testing (historical and hypothetical)
- Drawdown analysis
- Comprehensive risk-adjusted metrics
- Portfolio comparison
"""
def __init__(self, returns: pd.DataFrame,
weights: np.ndarray = None,
portfolio_value: float = 1_000_000):
self.returns = returns
self.assets = list(returns.columns)
self.weights = weights if weights is not None else \
np.ones(len(self.assets)) / len(self.assets)
self.portfolio_value = portfolio_value
self.portfolio_returns = pd.Series(
returns.values @ self.weights,
index=returns.index
)
self._calculate_all_metrics()
def _calculate_all_metrics(self):
"""Calculate all risk metrics."""
ret = np.array(self.portfolio_returns)
# Basic
self.ann_return = np.mean(ret) * 252
self.ann_vol = np.std(ret) * np.sqrt(252)
# VaR/ES
self.var_95, self.es_95 = calculate_var_es(ret, 0.95)
self.var_99, self.es_99 = calculate_var_es(ret, 0.99)
# Drawdown
dd = calculate_drawdowns(self.portfolio_returns)
self.max_drawdown = dd['max_drawdown']
self.drawdown_series = dd['drawdown_series']
self.dar_95, self.cdar_95 = calculate_cdar(self.portfolio_returns, 0.95)
# Ratios
self.sharpe = self.ann_return / self.ann_vol if self.ann_vol > 0 else 0
self.sortino = sortino_ratio(ret)
self.omega = omega_ratio(ret)
self.calmar = self.ann_return / abs(self.max_drawdown) if self.max_drawdown != 0 else 0
def var_report(self):
"""Generate VaR report."""
print("\nVALUE AT RISK REPORT")
print("=" * 50)
print(f"Portfolio Value: ${self.portfolio_value:,.0f}")
print(f"\n95% Confidence:")
print(f" VaR: {self.var_95*100:.2f}% (${self.var_95*self.portfolio_value:,.0f})")
print(f" ES: {self.es_95*100:.2f}% (${self.es_95*self.portfolio_value:,.0f})")
print(f"\n99% Confidence:")
print(f" VaR: {self.var_99*100:.2f}% (${self.var_99*self.portfolio_value:,.0f})")
print(f" ES: {self.es_99*100:.2f}% (${self.es_99*self.portfolio_value:,.0f})")
def stress_test(self, scenarios: dict):
"""Run stress tests."""
print("\nSTRESS TEST RESULTS")
print("=" * 50)
for name, asset_impacts in scenarios.items():
impact = sum(self.weights[i] * asset_impacts.get(a, 0)
for i, a in enumerate(self.assets))
dollar_impact = impact * self.portfolio_value
print(f"{name}: {impact*100:+.1f}% (${dollar_impact:+,.0f})")
def drawdown_report(self):
"""Generate drawdown report."""
print("\nDRAWDOWN REPORT")
print("=" * 50)
print(f"Max Drawdown: {self.max_drawdown*100:.1f}%")
print(f"DaR (95%): {self.dar_95*100:.1f}%")
print(f"CDaR (95%): {self.cdar_95*100:.1f}%")
def performance_report(self):
"""Generate performance report."""
print("\nPERFORMANCE REPORT")
print("=" * 50)
print(f"Annual Return: {self.ann_return*100:.2f}%")
print(f"Annual Volatility: {self.ann_vol*100:.2f}%")
print(f"\nRisk-Adjusted Ratios:")
print(f" Sharpe: {self.sharpe:.2f}")
print(f" Sortino: {self.sortino:.2f}")
print(f" Omega: {self.omega:.2f}")
print(f" Calmar: {self.calmar:.2f}")
def full_report(self):
"""Generate comprehensive report."""
print("\n" + "=" * 60)
print("PRODUCTION RISK MANAGEMENT REPORT")
print("=" * 60)
print(f"\nPortfolio: {dict(zip(self.assets, self.weights))}")
print(f"Portfolio Value: ${self.portfolio_value:,.0f}")
self.performance_report()
self.var_report()
self.drawdown_report()
# Stress test with default scenarios
scenarios = {
'Market Crash': {'SPY': -0.20, 'QQQ': -0.25, 'TLT': 0.05, 'GLD': 0.03},
'Stagflation': {'SPY': -0.12, 'QQQ': -0.18, 'TLT': -0.15, 'GLD': 0.25}
}
self.stress_test(scenarios)
# Test
system = ProductionRiskSystem(
returns=returns,
weights=weights,
portfolio_value=1_000_000
)
system.full_report()
Key Takeaways
What You Learned
1. Expected Shortfall (CVaR)
- Measures average loss when VaR is breached
- Preferred by regulators for coherence properties
- Always greater than or equal to VaR
2. Stress Testing
- Historical scenarios apply real crisis returns
- Hypothetical scenarios test unprecedented events
- No single portfolio is safe in all scenarios
3. Drawdown Analysis
- Maximum drawdown captures peak-to-trough loss
- Duration matters as much as magnitude
- CDaR extends ES concept to drawdowns
4. Tail Risk Measures
- Tail ratio compares extreme gains to losses
- Sortino penalizes only downside volatility
- Omega considers the entire distribution
Coming Up Next
In Module 9: Factor Models, we'll explore: - CAPM and beta estimation - Fama-French multi-factor models - Factor-based portfolio construction - Style analysis and attribution
Congratulations on completing Module 8!
Module 9: Factor Models
Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling
Learning Objectives
By the end of this module, you will be able to:
- Understand and implement CAPM for beta estimation
- Apply multi-factor models (Fama-French 3-factor and 5-factor)
- Calculate alpha and evaluate statistical significance
- Perform factor-based portfolio analysis and attribution
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 8: Beyond VaR |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Libraries loaded successfully!')
Load Data
# Download stock and market data
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'JNJ', 'JPM', 'XOM']
market_ticker = 'SPY'
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)
print("Downloading market and stock data...")
all_tickers = tickers + [market_ticker]
data = yf.download(all_tickers, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data.iloc[:, :len(all_tickers)]
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
prices.columns = [str(col) for col in prices.columns]
# Calculate returns
returns = prices.pct_change().dropna()
# Separate market and stock returns
market_returns = returns[market_ticker]
stock_returns = returns[tickers]
# Risk-free rate
risk_free_rate = 0.04
daily_rf = risk_free_rate / 252
# Calculate excess returns
market_excess = market_returns - daily_rf
stock_excess = stock_returns.sub(daily_rf, axis=0)
print(f"\nData loaded: {len(prices)} trading days")
print(f"Stocks: {tickers}")
print(f"Market proxy: {market_ticker}")
Section 9.1: CAPM and Beta
The Capital Asset Pricing Model (CAPM) tells us that expected return is determined by a single factor—market risk (beta).
$$E[R_i] = R_f + \beta_i (E[R_m] - R_f)$$
In this section, you will learn: - Beta calculation methods - Rolling beta for time-varying risk - Interpreting beta components
9.1.1 Beta Calculation
Beta measures sensitivity to market movements:
$$\beta_i = \frac{Cov(R_i, R_m)}{Var(R_m)} = \rho_{i,m} \cdot \frac{\sigma_i}{\sigma_m}$$
def calculate_beta_cov(stock_returns: pd.Series,
market_returns: pd.Series) -> float:
"""Calculate beta using covariance method."""
cov = stock_returns.cov(market_returns)
var = market_returns.var()
return cov / var
def calculate_beta_regression(stock_returns: pd.Series,
market_returns: pd.Series) -> tuple:
"""Calculate beta using OLS regression."""
X = sm.add_constant(market_returns)
model = sm.OLS(stock_returns, X).fit()
return model.params.iloc[1], model
# Calculate betas for all stocks
print("Beta Calculation Comparison")
print("=" * 60)
print(f"{'Ticker':<8} {'Cov/Var':>10} {'Regression':>12} {'R-squared':>12}")
print("-" * 60)
betas = {}
models = {}
for ticker in tickers:
beta_cov = calculate_beta_cov(stock_returns[ticker], market_returns)
beta_reg, model = calculate_beta_regression(stock_excess[ticker], market_excess)
betas[ticker] = beta_reg
models[ticker] = model
print(f"{ticker:<8} {beta_cov:>10.3f} {beta_reg:>12.3f} {model.rsquared:>12.3f}")
# Visualize the Security Market Line
plt.figure(figsize=(12, 8))
# Calculate annual market return and risk premium
annual_market_return = market_returns.mean() * 252
market_risk_premium = annual_market_return - risk_free_rate
# SML line
beta_range = np.linspace(0, 2.5, 100)
sml_returns = risk_free_rate + beta_range * market_risk_premium
plt.plot(beta_range, sml_returns, 'b-', linewidth=2, label='Security Market Line')
# Plot each stock
for ticker in tickers:
beta = betas[ticker]
actual_return = stock_returns[ticker].mean() * 252
expected_return = risk_free_rate + beta * market_risk_premium
color = 'green' if actual_return > expected_return else 'red'
plt.scatter(beta, actual_return, s=150, c=color, edgecolors='black',
linewidth=1.5, zorder=5)
plt.annotate(ticker, (beta, actual_return), xytext=(5, 5),
textcoords='offset points', fontsize=10)
plt.scatter(1, annual_market_return, s=200, marker='D', c='blue',
edgecolors='black', linewidth=2, zorder=5, label='Market (β=1)')
plt.xlabel('Beta (β)', fontsize=12)
plt.ylabel('Annual Return', fontsize=12)
plt.title('Security Market Line\nGreen = Positive Alpha, Red = Negative Alpha',
fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
9.1.2 Rolling Beta
Beta is not constant over time. Rolling beta shows how market sensitivity evolves.
def calculate_rolling_beta(stock_returns: pd.Series,
market_returns: pd.Series,
window: int = 252) -> pd.Series:
"""Calculate rolling beta over a specified window."""
rolling_cov = stock_returns.rolling(window=window).cov(market_returns)
rolling_var = market_returns.rolling(window=window).var()
return rolling_cov / rolling_var
# Calculate rolling betas
window = 252
rolling_betas = pd.DataFrame()
for ticker in tickers:
rolling_betas[ticker] = calculate_rolling_beta(
stock_returns[ticker], market_returns, window
)
rolling_betas = rolling_betas.dropna()
# Plot rolling betas
fig, ax = plt.subplots(figsize=(14, 8))
for ticker in tickers:
ax.plot(rolling_betas.index, rolling_betas[ticker], label=ticker, alpha=0.7)
ax.axhline(y=1, color='black', linestyle='--', linewidth=2, alpha=0.5, label='Market (β=1)')
ax.set_xlabel('Date')
ax.set_ylabel('Rolling Beta (252-day)')
ax.set_title('Rolling Beta Over Time', fontsize=14, fontweight='bold')
ax.legend(loc='upper right', ncol=3)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Exercise 9.1: Beta Component Analysis (Guided)
Your Task: Break down beta into its components: correlation and relative volatility.
$$\beta = \rho_{i,m} \times \frac{\sigma_i}{\sigma_m}$$
Fill in the blanks to complete the function:
Click to reveal solution
def beta_components(stock_returns: pd.Series,
market_returns: pd.Series) -> dict:
correlation = stock_returns.corr(market_returns)
stock_vol = stock_returns.std()
market_vol = market_returns.std()
vol_ratio = stock_vol / market_vol
beta = correlation * vol_ratio
return {
'beta': beta,
'correlation': correlation,
'vol_ratio': vol_ratio,
'stock_vol': stock_vol * np.sqrt(252),
'market_vol': market_vol * np.sqrt(252)
}
# Test for all stocks
print("Beta Component Analysis")
print("=" * 70)
print(f"{'Stock':<8} {'Beta':>8} {'Corr':>8} {'Vol Ratio':>10} {'Stock Vol':>10}")
print("-" * 70)
for ticker in tickers:
result = beta_components(stock_returns[ticker], market_returns)
print(f"{ticker:<8} {result['beta']:>8.3f} {result['correlation']:>8.3f} "
f"{result['vol_ratio']:>10.2f} {result['stock_vol']*100:>9.1f}%")
Section 9.2: Multi-Factor Models
CAPM uses only one factor (market). The Fama-French models add additional factors that help explain cross-sectional returns.
In this section, you will learn: - Fama-French 3-factor model (Market, SMB, HML) - Fama-French 5-factor model (adds RMW, CMA) - Factor loading interpretation
9.2.1 Fama-French Factor Construction
We'll simulate Fama-French factors for demonstration (in practice, use data from Kenneth French's library).
# Simulate Fama-French factors based on market returns
np.random.seed(42)
n_days = len(market_returns)
# SMB (Small Minus Big) - size factor
smb = pd.Series(
np.random.normal(0.0001, 0.006, n_days) + market_returns.values * 0.2,
index=market_returns.index,
name='SMB'
)
# HML (High Minus Low) - value factor
hml = pd.Series(
np.random.normal(0.0001, 0.005, n_days) - market_returns.values * 0.1,
index=market_returns.index,
name='HML'
)
# RMW (Robust Minus Weak) - profitability factor
rmw = pd.Series(
np.random.normal(0.0001, 0.004, n_days),
index=market_returns.index,
name='RMW'
)
# CMA (Conservative Minus Aggressive) - investment factor
cma = pd.Series(
np.random.normal(0.0001, 0.004, n_days),
index=market_returns.index,
name='CMA'
)
# Create factor DataFrame
factors_3 = pd.DataFrame({
'MKT': market_excess,
'SMB': smb,
'HML': hml
})
factors_5 = pd.DataFrame({
'MKT': market_excess,
'SMB': smb,
'HML': hml,
'RMW': rmw,
'CMA': cma
})
print("Fama-French Factors (Simulated)")
print("=" * 50)
print("\nFactor Statistics (Annualized):")
for col in factors_5.columns:
mean = factors_5[col].mean() * 252 * 100
std = factors_5[col].std() * np.sqrt(252) * 100
print(f" {col}: Mean={mean:.2f}%, Vol={std:.2f}%")
9.2.2 Three-Factor Model
$$R_i - R_f = \alpha + \beta_{MKT}(R_m - R_f) + \beta_{SMB} \cdot SMB + \beta_{HML} \cdot HML + \epsilon$$
def fit_factor_model(stock_excess: pd.Series,
factors: pd.DataFrame) -> dict:
"""
Fit a multi-factor model.
Args:
stock_excess: Excess returns of the stock
factors: DataFrame of factor returns
Returns:
Dictionary with model results
"""
X = sm.add_constant(factors)
model = sm.OLS(stock_excess, X).fit()
return {
'alpha': model.params.iloc[0],
'alpha_annual': model.params.iloc[0] * 252,
'alpha_tstat': model.tvalues.iloc[0],
'alpha_pvalue': model.pvalues.iloc[0],
'betas': model.params.iloc[1:].to_dict(),
'r_squared': model.rsquared,
'model': model
}
# Fit 3-factor model for all stocks
print("Fama-French 3-Factor Model Results")
print("=" * 80)
print(f"{'Stock':<8} {'Alpha(ann)':>12} {'MKT':>8} {'SMB':>8} {'HML':>8} {'R²':>8}")
print("-" * 80)
ff3_results = {}
for ticker in tickers:
result = fit_factor_model(stock_excess[ticker], factors_3)
ff3_results[ticker] = result
print(f"{ticker:<8} {result['alpha_annual']*100:>11.2f}% "
f"{result['betas']['MKT']:>8.3f} "
f"{result['betas']['SMB']:>8.3f} "
f"{result['betas']['HML']:>8.3f} "
f"{result['r_squared']:>8.3f}")
9.2.3 Five-Factor Model
# Fit 5-factor model for all stocks
print("Fama-French 5-Factor Model Results")
print("=" * 100)
print(f"{'Stock':<8} {'Alpha':>10} {'MKT':>8} {'SMB':>8} {'HML':>8} {'RMW':>8} {'CMA':>8} {'R²':>8}")
print("-" * 100)
ff5_results = {}
for ticker in tickers:
result = fit_factor_model(stock_excess[ticker], factors_5)
ff5_results[ticker] = result
print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
f"{result['betas']['MKT']:>8.3f} "
f"{result['betas']['SMB']:>8.3f} "
f"{result['betas']['HML']:>8.3f} "
f"{result['betas']['RMW']:>8.3f} "
f"{result['betas']['CMA']:>8.3f} "
f"{result['r_squared']:>8.3f}")
Exercise 9.2: Factor Model Comparison (Guided)
Your Task: Compare CAPM vs 3-factor vs 5-factor models for a stock.
Fill in the blanks to complete the function:
Click to reveal solution
def compare_factor_models(stock_excess: pd.Series,
market_excess: pd.Series,
factors_3: pd.DataFrame,
factors_5: pd.DataFrame) -> pd.DataFrame:
results = []
# CAPM
X_capm = sm.add_constant(market_excess)
model_capm = sm.OLS(stock_excess, X_capm).fit()
results.append({
'Model': 'CAPM',
'Alpha': model_capm.params.iloc[0] * 252 * 100,
'R_squared': model_capm.rsquared,
'AIC': model_capm.aic
})
# 3-Factor
X_3f = sm.add_constant(factors_3)
model_3f = sm.OLS(stock_excess, X_3f).fit()
results.append({
'Model': '3-Factor',
'Alpha': model_3f.params.iloc[0] * 252 * 100,
'R_squared': model_3f.rsquared,
'AIC': model_3f.aic
})
# 5-Factor
X_5f = sm.add_constant(factors_5)
model_5f = sm.OLS(stock_excess, X_5f).fit()
results.append({
'Model': '5-Factor',
'Alpha': model_5f.params.iloc[0] * 252 * 100,
'R_squared': model_5f.rsquared,
'AIC': model_5f.aic
})
return pd.DataFrame(results)
# Test
for ticker in ['AAPL', 'TSLA', 'JNJ']:
print(f"\n{ticker} Model Comparison:")
comp = compare_factor_models(stock_excess[ticker], market_excess, factors_3, factors_5)
print(comp.to_string(index=False))
Section 9.3: Alpha Analysis
Alpha is the intercept in a factor model—the return not explained by factor exposures.
In this section, you will learn: - Jensen's alpha interpretation - Statistical significance testing - Information ratio
# Alpha analysis with significance
print("Alpha Significance Analysis")
print("=" * 70)
print(f"{'Stock':<8} {'Alpha (ann)':>12} {'t-stat':>10} {'p-value':>10} {'Significant':>12}")
print("-" * 70)
for ticker in tickers:
result = ff3_results[ticker]
sig = "Yes" if result['alpha_pvalue'] < 0.05 else "No"
print(f"{ticker:<8} {result['alpha_annual']*100:>11.2f}% "
f"{result['alpha_tstat']:>10.3f} "
f"{result['alpha_pvalue']:>10.4f} "
f"{sig:>12}")
# Calculate Information Ratio
def information_ratio(stock_excess: pd.Series, model_result: dict) -> float:
"""
Calculate Information Ratio.
IR = Alpha / Tracking Error
"""
alpha_annual = model_result['alpha_annual']
residuals = model_result['model'].resid
tracking_error = residuals.std() * np.sqrt(252)
return alpha_annual / tracking_error
print("Information Ratio Analysis")
print("=" * 55)
print(f"{'Stock':<8} {'Alpha':>10} {'Track Error':>12} {'Info Ratio':>12}")
print("-" * 55)
for ticker in tickers:
result = ff3_results[ticker]
ir = information_ratio(stock_excess[ticker], result)
te = result['model'].resid.std() * np.sqrt(252)
print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
f"{te*100:>11.2f}% "
f"{ir:>12.3f}")
print("\nInformation Ratio Interpretation:")
print(" > 0.5: Good | > 1.0: Excellent | > 1.5: Exceptional")
Exercise 9.3: Rolling Alpha Analysis (Open-ended)
Your Task:
Build a function that calculates rolling alpha with significance: - Calculate alpha using a rolling window - Track t-statistics over time - Identify periods of significant alpha - Visualize the results
Your implementation:
Click to reveal solution
def rolling_alpha_analysis(stock_excess: pd.Series,
factors: pd.DataFrame,
window: int = 252) -> pd.DataFrame:
"""
Calculate rolling alpha with significance.
Args:
stock_excess: Stock excess returns
factors: Factor returns
window: Rolling window
Returns:
DataFrame with rolling alpha and t-stats
"""
alphas = []
tstats = []
dates = []
for i in range(window, len(stock_excess)):
y = stock_excess.iloc[i-window:i]
X = sm.add_constant(factors.iloc[i-window:i])
try:
model = sm.OLS(y, X).fit()
alphas.append(model.params.iloc[0] * 252) # Annualized
tstats.append(model.tvalues.iloc[0])
dates.append(stock_excess.index[i])
except:
alphas.append(np.nan)
tstats.append(np.nan)
dates.append(stock_excess.index[i])
return pd.DataFrame({
'alpha': alphas,
't_stat': tstats
}, index=dates)
# Calculate for AAPL
rolling_alpha_df = rolling_alpha_analysis(stock_excess['AAPL'], factors_3, 252)
# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
ax1 = axes[0]
ax1.plot(rolling_alpha_df.index, rolling_alpha_df['alpha'] * 100, linewidth=1.5)
ax1.axhline(0, color='black', linestyle='--', alpha=0.5)
ax1.fill_between(rolling_alpha_df.index, 0, rolling_alpha_df['alpha'] * 100,
where=rolling_alpha_df['alpha'] > 0, alpha=0.3, color='green')
ax1.fill_between(rolling_alpha_df.index, 0, rolling_alpha_df['alpha'] * 100,
where=rolling_alpha_df['alpha'] <= 0, alpha=0.3, color='red')
ax1.set_ylabel('Alpha (%)')
ax1.set_title('AAPL Rolling Alpha (252-day)')
ax2 = axes[1]
ax2.plot(rolling_alpha_df.index, rolling_alpha_df['t_stat'], linewidth=1.5)
ax2.axhline(1.96, color='red', linestyle='--', label='t=1.96')
ax2.axhline(-1.96, color='red', linestyle='--')
ax2.axhline(0, color='black', linestyle='-', alpha=0.3)
ax2.set_ylabel('t-statistic')
ax2.set_xlabel('Date')
ax2.set_title('Alpha Significance')
ax2.legend()
plt.tight_layout()
plt.show()
# Count significant periods
sig_positive = (rolling_alpha_df['t_stat'] > 1.96).sum()
sig_negative = (rolling_alpha_df['t_stat'] < -1.96).sum()
total = len(rolling_alpha_df)
print(f"Significant positive alpha: {sig_positive/total*100:.1f}% of days")
print(f"Significant negative alpha: {sig_negative/total*100:.1f}% of days")
Section 9.4: Factor Attribution
Factor attribution decomposes portfolio returns into factor contributions.
In this section, you will learn: - Return decomposition by factors - Factor contribution analysis - Style analysis
def factor_attribution(stock_excess: pd.Series,
factors: pd.DataFrame,
model_result: dict) -> dict:
"""
Decompose returns by factor contributions.
Args:
stock_excess: Stock excess returns
factors: Factor returns
model_result: Fitted model result
Returns:
Dictionary with factor contributions
"""
betas = model_result['betas']
alpha = model_result['alpha']
# Total return
total_return = stock_excess.mean() * 252
# Factor contributions
contributions = {}
for factor, beta in betas.items():
factor_return = factors[factor].mean() * 252
contributions[factor] = beta * factor_return
# Alpha contribution
contributions['Alpha'] = alpha * 252
return {
'total_return': total_return,
'contributions': contributions
}
# Attribution for all stocks
print("Factor Attribution Analysis (Annualized)")
print("=" * 80)
for ticker in ['AAPL', 'TSLA', 'JNJ']:
attr = factor_attribution(stock_excess[ticker], factors_3, ff3_results[ticker])
print(f"\n{ticker}:")
print(f" Total Excess Return: {attr['total_return']*100:.2f}%")
print(" Contributions:")
for factor, contrib in attr['contributions'].items():
print(f" {factor}: {contrib*100:+.2f}%")
# Visualize factor contributions
def plot_factor_attribution(ticker: str, attr: dict):
"""Create waterfall chart for factor attribution."""
fig, ax = plt.subplots(figsize=(10, 6))
contributions = attr['contributions']
labels = list(contributions.keys())
values = [v * 100 for v in contributions.values()]
colors = ['green' if v > 0 else 'red' for v in values]
bars = ax.bar(labels, values, color=colors, edgecolor='black', alpha=0.7)
# Add total
total = sum(values)
ax.bar('Total', total, color='blue', edgecolor='black', alpha=0.7)
ax.axhline(0, color='black', linewidth=0.5)
ax.set_ylabel('Contribution (%)')
ax.set_title(f'{ticker} Factor Attribution')
# Add value labels
for bar, val in zip(bars, values):
ax.annotate(f'{val:+.2f}%',
xy=(bar.get_x() + bar.get_width()/2, val),
ha='center', va='bottom' if val > 0 else 'top')
plt.tight_layout()
plt.show()
# Plot for a selected stock
attr = factor_attribution(stock_excess['AAPL'], factors_3, ff3_results['AAPL'])
plot_factor_attribution('AAPL', attr)
Exercise 9.4: Portfolio Factor Exposure (Guided)
Your Task: Calculate the aggregate factor exposures for a portfolio of stocks.
Fill in the blanks to complete the function:
Click to reveal solution
def portfolio_factor_exposure(weights: dict,
factor_results: dict) -> dict:
first_result = list(factor_results.values())[0]
factors = list(first_result['betas'].keys())
portfolio_betas = {f: 0.0 for f in factors}
portfolio_alpha = 0.0
for stock, weight in weights.items():
if stock in factor_results:
result = factor_results[stock]
for factor in factors:
portfolio_betas[factor] += weight * result['betas'][factor]
portfolio_alpha += weight * result['alpha']
return {
'betas': portfolio_betas,
'alpha': portfolio_alpha,
'alpha_annual': portfolio_alpha * 252
}
# Test with different portfolios
portfolios = {
'Tech': {'AAPL': 0.33, 'MSFT': 0.34, 'GOOGL': 0.33},
'Defensive': {'JNJ': 0.5, 'XOM': 0.5},
'Growth': {'TSLA': 0.5, 'AMZN': 0.5}
}
print("Portfolio Factor Exposures")
print("=" * 60)
for port_name, weights in portfolios.items():
exp = portfolio_factor_exposure(weights, ff3_results)
print(f"\n{port_name}:")
print(f" MKT: {exp['betas']['MKT']:.3f}")
print(f" SMB: {exp['betas']['SMB']:.3f}")
print(f" HML: {exp['betas']['HML']:.3f}")
print(f" Alpha (ann): {exp['alpha_annual']*100:.2f}%")
Exercise 9.5: Style Analysis (Open-ended)
Your Task:
Build a function that performs style analysis: - Determine if a stock is value/growth based on HML loading - Determine if it's small/large cap based on SMB loading - Create a style quadrant visualization
Your implementation:
Click to reveal solution
def style_analysis(factor_results: dict) -> pd.DataFrame:
"""
Perform style analysis based on factor loadings.
Args:
factor_results: Stock -> factor model results
Returns:
DataFrame with style classifications
"""
results = []
for ticker, result in factor_results.items():
smb = result['betas']['SMB']
hml = result['betas']['HML']
mkt = result['betas']['MKT']
# Size classification
size = 'Small' if smb > 0 else 'Large'
# Value/Growth classification
style = 'Value' if hml > 0 else 'Growth'
# Combined style box
quadrant = f'{size}-{style}'
results.append({
'Stock': ticker,
'SMB': smb,
'HML': hml,
'MKT': mkt,
'Size': size,
'Style': style,
'Quadrant': quadrant
})
return pd.DataFrame(results)
# Perform style analysis
styles = style_analysis(ff3_results)
print("Style Analysis")
print("=" * 70)
print(styles.to_string(index=False))
# Plot style quadrant
fig, ax = plt.subplots(figsize=(10, 8))
for _, row in styles.iterrows():
color = {'Large-Value': 'blue', 'Large-Growth': 'green',
'Small-Value': 'orange', 'Small-Growth': 'red'}[row['Quadrant']]
ax.scatter(row['HML'], row['SMB'], s=200, c=color,
edgecolors='black', linewidth=1.5)
ax.annotate(row['Stock'], (row['HML'], row['SMB']),
xytext=(5, 5), textcoords='offset points')
ax.axhline(0, color='black', linestyle='-', linewidth=0.5)
ax.axvline(0, color='black', linestyle='-', linewidth=0.5)
# Add quadrant labels
ax.text(0.1, 0.1, 'Small\nValue', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.8, 0.1, 'Small\nGrowth', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.1, 0.85, 'Large\nValue', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.8, 0.85, 'Large\nGrowth', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.set_xlabel('HML Loading (Value ↔ Growth)')
ax.set_ylabel('SMB Loading (Small ↔ Large)')
ax.set_title('Style Box Analysis')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Exercise 9.6: Complete Factor Model System (Open-ended)
Your Task:
Build a comprehensive FactorModel class that: - Fits CAPM, 3-factor, and 5-factor models - Calculates alpha with significance - Performs factor attribution - Generates visualizations and reports
Your implementation:
Click to reveal solution
class FactorModelAnalyzer:
"""
Comprehensive factor model analysis system.
Supports CAPM, 3-factor, and 5-factor models with
alpha analysis, attribution, and style classification.
"""
def __init__(self, stock_excess: pd.Series,
market_excess: pd.Series,
factors: pd.DataFrame,
ticker: str = "Stock"):
self.stock_excess = stock_excess
self.market_excess = market_excess
self.factors = factors
self.ticker = ticker
self.models = {}
self._fit_all_models()
def _fit_all_models(self):
"""Fit CAPM and multi-factor models."""
# CAPM
X_capm = sm.add_constant(self.market_excess)
self.models['CAPM'] = sm.OLS(self.stock_excess, X_capm).fit()
# Multi-factor
X_mf = sm.add_constant(self.factors)
self.models['Multi-Factor'] = sm.OLS(self.stock_excess, X_mf).fit()
def get_alpha(self, model_name: str = 'Multi-Factor') -> dict:
"""Get alpha statistics."""
model = self.models[model_name]
return {
'alpha_daily': model.params.iloc[0],
'alpha_annual': model.params.iloc[0] * 252,
't_stat': model.tvalues.iloc[0],
'p_value': model.pvalues.iloc[0],
'significant': model.pvalues.iloc[0] < 0.05
}
def get_betas(self, model_name: str = 'Multi-Factor') -> dict:
"""Get factor betas."""
model = self.models[model_name]
return model.params.iloc[1:].to_dict()
def get_r_squared(self, model_name: str = 'Multi-Factor') -> float:
"""Get model R-squared."""
return self.models[model_name].rsquared
def factor_attribution(self) -> dict:
"""Decompose returns by factors."""
betas = self.get_betas()
alpha = self.get_alpha()['alpha_daily']
contributions = {}
for factor, beta in betas.items():
contributions[factor] = beta * self.factors[factor].mean() * 252
contributions['Alpha'] = alpha * 252
return contributions
def information_ratio(self) -> float:
"""Calculate Information Ratio."""
alpha = self.get_alpha()['alpha_annual']
te = self.models['Multi-Factor'].resid.std() * np.sqrt(252)
return alpha / te
def summary(self):
"""Print comprehensive summary."""
print(f"\n{'='*60}")
print(f"FACTOR MODEL ANALYSIS: {self.ticker}")
print(f"{'='*60}")
# Model comparison
print(f"\nMODEL COMPARISON")
print(f"{'-'*40}")
for name, model in self.models.items():
alpha = model.params.iloc[0] * 252 * 100
print(f"{name}: R²={model.rsquared:.3f}, Alpha={alpha:+.2f}%")
# Alpha analysis
alpha = self.get_alpha()
print(f"\nALPHA ANALYSIS")
print(f"{'-'*40}")
print(f"Alpha (annual): {alpha['alpha_annual']*100:.2f}%")
print(f"t-statistic: {alpha['t_stat']:.3f}")
print(f"p-value: {alpha['p_value']:.4f}")
print(f"Significant: {'Yes' if alpha['significant'] else 'No'}")
print(f"Information Ratio: {self.information_ratio():.3f}")
# Factor betas
print(f"\nFACTOR LOADINGS")
print(f"{'-'*40}")
for factor, beta in self.get_betas().items():
print(f"{factor}: {beta:.3f}")
# Attribution
print(f"\nFACTOR ATTRIBUTION (Annual)")
print(f"{'-'*40}")
for factor, contrib in self.factor_attribution().items():
print(f"{factor}: {contrib*100:+.2f}%")
# Test
analyzer = FactorModelAnalyzer(
stock_excess['AAPL'],
market_excess,
factors_3,
'AAPL'
)
analyzer.summary()
Module Project: Production Factor Analysis System
Build a comprehensive factor analysis system suitable for institutional use.
# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionFactorSystem:
"""
Production-ready factor analysis system.
Features:
- Multi-stock factor model analysis
- Rolling analysis for time-varying exposures
- Portfolio-level factor attribution
- Style classification
- Comprehensive reporting
"""
def __init__(self, returns: pd.DataFrame,
market_returns: pd.Series,
factors: pd.DataFrame,
risk_free_rate: float = 0.04):
self.returns = returns
self.market_returns = market_returns
self.factors = factors
self.rf = risk_free_rate
self.daily_rf = risk_free_rate / 252
# Calculate excess returns
self.excess_returns = returns.sub(self.daily_rf, axis=0)
self.market_excess = market_returns - self.daily_rf
# Fit models for all stocks
self.results = {}
self._fit_all_models()
def _fit_all_models(self):
"""Fit factor models for all stocks."""
for ticker in self.returns.columns:
X = sm.add_constant(self.factors)
model = sm.OLS(self.excess_returns[ticker], X).fit()
self.results[ticker] = {
'model': model,
'alpha': model.params.iloc[0],
'alpha_annual': model.params.iloc[0] * 252,
'alpha_tstat': model.tvalues.iloc[0],
'alpha_pvalue': model.pvalues.iloc[0],
'betas': model.params.iloc[1:].to_dict(),
'r_squared': model.rsquared
}
def get_stock_summary(self, ticker: str) -> dict:
"""Get summary for a single stock."""
return self.results[ticker]
def portfolio_exposure(self, weights: dict) -> dict:
"""Calculate portfolio-level factor exposures."""
factors = list(self.factors.columns)
port_betas = {f: 0.0 for f in factors}
port_alpha = 0.0
for stock, weight in weights.items():
if stock in self.results:
for factor in factors:
port_betas[factor] += weight * self.results[stock]['betas'][factor]
port_alpha += weight * self.results[stock]['alpha']
return {
'betas': port_betas,
'alpha': port_alpha,
'alpha_annual': port_alpha * 252
}
def style_classification(self) -> pd.DataFrame:
"""Classify stocks by style."""
classifications = []
for ticker, result in self.results.items():
smb = result['betas'].get('SMB', 0)
hml = result['betas'].get('HML', 0)
size = 'Small' if smb > 0 else 'Large'
style = 'Value' if hml > 0 else 'Growth'
classifications.append({
'Stock': ticker,
'Size': size,
'Style': style,
'Quadrant': f'{size}-{style}'
})
return pd.DataFrame(classifications)
def report(self, portfolio_weights: dict = None):
"""Generate comprehensive report."""
print("\n" + "=" * 70)
print("PRODUCTION FACTOR ANALYSIS REPORT")
print("=" * 70)
# Individual stocks
print("\nINDIVIDUAL STOCK ANALYSIS")
print("-" * 70)
print(f"{'Stock':<8} {'Alpha':>10} {'MKT':>8} {'SMB':>8} {'HML':>8} {'R²':>8}")
print("-" * 70)
for ticker, result in self.results.items():
print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
f"{result['betas'].get('MKT', 0):>8.3f} "
f"{result['betas'].get('SMB', 0):>8.3f} "
f"{result['betas'].get('HML', 0):>8.3f} "
f"{result['r_squared']:>8.3f}")
# Style classification
print("\nSTYLE CLASSIFICATION")
print("-" * 70)
styles = self.style_classification()
print(styles.to_string(index=False))
# Portfolio analysis
if portfolio_weights:
print("\nPORTFOLIO FACTOR EXPOSURE")
print("-" * 70)
port_exp = self.portfolio_exposure(portfolio_weights)
print(f"Weights: {portfolio_weights}")
print(f"Portfolio Alpha (ann): {port_exp['alpha_annual']*100:.2f}%")
for factor, beta in port_exp['betas'].items():
print(f" {factor} Beta: {beta:.3f}")
# Test
system = ProductionFactorSystem(
returns=stock_returns,
market_returns=market_returns,
factors=factors_3
)
test_weights = {'AAPL': 0.3, 'MSFT': 0.3, 'JNJ': 0.2, 'XOM': 0.2}
system.report(test_weights)
Key Takeaways
What You Learned
1. CAPM and Beta
- Beta measures sensitivity to market movements
- Can be calculated via covariance or regression
- Rolling beta shows time-varying exposure
2. Multi-Factor Models
- Fama-French 3-factor: Market, Size (SMB), Value (HML)
- 5-factor adds Profitability (RMW) and Investment (CMA)
- More factors explain more variance but risk overfitting
3. Alpha Analysis
- Alpha is return not explained by factor exposures
- Statistical significance requires t-stat > 1.96
- Information Ratio measures alpha per unit tracking error
4. Factor Attribution
- Decomposes returns into factor contributions
- Style analysis classifies by size and value loadings
- Portfolio exposure is weighted average of stock exposures
Coming Up Next
In Module 10: Monte Carlo Simulation, we'll explore: - Geometric Brownian Motion simulation - Correlated multi-asset simulation - Option pricing with Monte Carlo - Portfolio simulation and scenario analysis
Congratulations on completing Module 9!
Module 10: Monte Carlo Simulation
Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics
Learning Objectives
By the end of this module, you will be able to:
- Generate reproducible random samples for financial simulations
- Simulate stock price paths using Geometric Brownian Motion
- Create correlated multi-asset simulations with Cholesky decomposition
- Apply Monte Carlo methods to portfolio analysis and option pricing
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 7-8: Risk Modeling |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from scipy.linalg import cholesky
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Module 10: Monte Carlo Simulation - Ready!')
Load Data
# Download data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2015-01-01', end='2024-01-01', progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
returns = prices.pct_change().dropna()
print(f'Data loaded: {len(prices)} days')
print(f'Assets: {list(returns.columns)}')
Section 10.1: Random Number Generation
Monte Carlo methods depend on generating random numbers. Understanding how to properly generate and control randomness is essential for reproducible research.
In this section, you will learn: - Pseudo-random number generation with seeds - Modern NumPy Generator objects - Generating samples from different distributions
10.1.1 Reproducibility with Seeds
Computers generate "pseudo-random" numbers - they appear random but are deterministic given a starting seed. This is crucial for reproducible research.
# Basic random number generation
print("Random Numbers Without Seed (different each run):")
print(np.random.randn(5))
print(np.random.randn(5))
print("\nWith Seed (reproducible):")
np.random.seed(42)
print(np.random.randn(5))
np.random.seed(42) # Reset to same seed
print(np.random.randn(5)) # Same numbers!
10.1.2 Modern Generator Objects
NumPy's modern approach uses Generator objects for better control and performance.
# Modern NumPy random generation
rng = np.random.default_rng(seed=42)
print("Using Generator object:")
print(f"Standard normal: {rng.standard_normal(5)}")
print(f"Uniform [0,1]: {rng.uniform(0, 1, 5)}")
print(f"Integers [1,10]: {rng.integers(1, 10, 5)}")
# Multiple independent generators
rng1 = np.random.default_rng(seed=100)
rng2 = np.random.default_rng(seed=200)
print(f"\nGenerator 1: {rng1.standard_normal(3)}")
print(f"Generator 2: {rng2.standard_normal(3)}")
10.1.3 Different Distributions
Financial returns often have "fat tails" - extreme values occur more frequently than a normal distribution predicts.
rng = np.random.default_rng(seed=42)
n_samples = 10000
# Generate samples from various distributions
normal = rng.normal(loc=0, scale=1, size=n_samples)
student_t = rng.standard_t(df=5, size=n_samples)
uniform = rng.uniform(-1, 1, size=n_samples)
lognormal = rng.lognormal(mean=0, sigma=0.5, size=n_samples)
# Compare distributions
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
distributions = [
(normal, 'Normal(0, 1)', 'steelblue'),
(student_t, 'Student-t (df=5)', 'orange'),
(uniform, 'Uniform(-1, 1)', 'green'),
(lognormal, 'Log-Normal(0, 0.5)', 'crimson')
]
for ax, (data, name, color) in zip(axes.flatten(), distributions):
ax.hist(data, bins=50, density=True, alpha=0.7, color=color, edgecolor='white')
ax.set_title(name)
ax.set_xlabel('Value')
ax.set_ylabel('Density')
plt.tight_layout()
plt.show()
Exercise 10.1: Fat Tail Analysis (Guided)
Your Task: Compare the frequency of extreme events between Normal and Student-t distributions.
Fill in the blanks to complete the analysis:
Click to reveal solution
def compare_tail_events(n_samples: int = 100000, threshold: float = 3.0, df: int = 4) -> dict:
"""
Compare extreme events between Normal and Student-t distributions.
"""
rng = np.random.default_rng(seed=42)
# Generate normal samples with mean=0, std=1
normal_samples = rng.normal(0, 1, n_samples)
# Generate Student-t samples
t_samples = rng.standard_t(df=df, size=n_samples)
# Count events beyond threshold std devs
normal_extreme = np.sum(np.abs(normal_samples) > threshold)
t_extreme = np.sum(np.abs(t_samples) > threshold)
return {
'normal_count': normal_extreme,
'normal_pct': normal_extreme / n_samples * 100,
't_count': t_extreme,
't_pct': t_extreme / n_samples * 100,
'ratio': t_extreme / max(normal_extreme, 1)
}
# Test
result = compare_tail_events()
print(f"Normal extreme events: {result['normal_count']} ({result['normal_pct']:.3f}%)")
print(f"Student-t extreme events: {result['t_count']} ({result['t_pct']:.3f}%)")
print(f"Student-t has {result['ratio']:.1f}x more extreme events")
Section 10.2: Simulating Price Paths
The standard model for stock prices assumes they follow Geometric Brownian Motion (GBM).
In this section, you will learn: - The GBM model and its assumptions - Simulating single-asset price paths - Adding fat tails to simulations
10.2.1 Geometric Brownian Motion
The discrete-time solution for GBM is:
$$S_{t+1} = S_t \exp\left[(\mu - \frac{\sigma^2}{2})\Delta t + \sigma\sqrt{\Delta t} \cdot Z\right]$$
Where $Z \sim N(0,1)$.
def simulate_gbm(S0: float, mu: float, sigma: float, T: float,
n_steps: int, n_paths: int, seed: int = None) -> np.ndarray:
"""
Simulate stock price paths using Geometric Brownian Motion.
Args:
S0: Initial stock price
mu: Annual drift (expected return)
sigma: Annual volatility
T: Time horizon in years
n_steps: Number of time steps
n_paths: Number of simulation paths
seed: Random seed for reproducibility
Returns:
Price paths array of shape (n_steps + 1, n_paths)
"""
rng = np.random.default_rng(seed)
dt = T / n_steps
# Pre-compute constants
drift = (mu - 0.5 * sigma**2) * dt
diffusion = sigma * np.sqrt(dt)
# Generate random shocks
Z = rng.standard_normal((n_steps, n_paths))
# Calculate log returns
log_returns = drift + diffusion * Z
# Build price paths
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(S0)
log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)
return np.exp(log_prices)
# Estimate parameters from SPY
spy_returns = returns['SPY']
mu_annual = spy_returns.mean() * 252
sigma_annual = spy_returns.std() * np.sqrt(252)
S0 = float(prices['SPY'].iloc[-1])
print(f"SPY Parameters:")
print(f" Current price: ${S0:.2f}")
print(f" Annual return (mu): {mu_annual*100:.1f}%")
print(f" Annual volatility (sigma): {sigma_annual*100:.1f}%")
# Simulate 1 year of SPY prices
T = 1 # 1 year
n_steps = 252 # Daily steps
n_paths = 1000
paths = simulate_gbm(S0, mu_annual, sigma_annual, T, n_steps, n_paths, seed=42)
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: Sample paths
ax1 = axes[0]
time_grid = np.linspace(0, T, n_steps + 1)
for i in range(min(100, n_paths)):
ax1.plot(time_grid, paths[:, i], alpha=0.1, color='steelblue', linewidth=0.5)
# Add percentile bands
percentiles = [5, 25, 50, 75, 95]
for p in percentiles:
ax1.plot(time_grid, np.percentile(paths, p, axis=1), linewidth=2, label=f'{p}th pct')
ax1.axhline(S0, color='black', linestyle='--', label='Initial Price')
ax1.set_xlabel('Time (years)')
ax1.set_ylabel('Price ($)')
ax1.set_title(f'SPY Price Simulation ({n_paths} paths)')
ax1.legend(loc='upper left')
# Right: Terminal price distribution
ax2 = axes[1]
terminal_prices = paths[-1, :]
ax2.hist(terminal_prices, bins=50, density=True, alpha=0.7, color='steelblue', edgecolor='white')
ax2.axvline(S0, color='black', linestyle='--', linewidth=2, label=f'Initial: ${S0:.0f}')
ax2.axvline(np.mean(terminal_prices), color='orange', linewidth=2, label=f'Mean: ${np.mean(terminal_prices):.0f}')
ax2.set_xlabel('Terminal Price ($)')
ax2.set_ylabel('Density')
ax2.set_title('Distribution of 1-Year Ending Prices')
ax2.legend()
plt.tight_layout()
plt.show()
# Summary statistics
terminal_returns = (terminal_prices / S0 - 1) * 100
print(f"\nTerminal Price Statistics:")
print(f" Mean: ${np.mean(terminal_prices):.2f} ({np.mean(terminal_returns):+.1f}%)")
print(f" Median: ${np.median(terminal_prices):.2f}")
print(f" 5th percentile: ${np.percentile(terminal_prices, 5):.2f}")
print(f" 95th percentile: ${np.percentile(terminal_prices, 95):.2f}")
10.2.2 Adding Fat Tails
GBM assumes normal returns, but real returns have fat tails. We can use Student-t distribution instead.
def simulate_gbm_fat_tails(S0: float, mu: float, sigma: float, T: float,
n_steps: int, n_paths: int, df: int = 5,
seed: int = None) -> np.ndarray:
"""
Simulate price paths with fat-tailed returns (Student-t).
Args:
df: Degrees of freedom for Student-t (lower = fatter tails)
"""
rng = np.random.default_rng(seed)
dt = T / n_steps
drift = (mu - 0.5 * sigma**2) * dt
# Student-t has variance = df/(df-2), scale to match desired sigma
scale = np.sqrt((df - 2) / df)
diffusion = sigma * np.sqrt(dt) * scale
# Generate Student-t shocks
Z = rng.standard_t(df, size=(n_steps, n_paths))
log_returns = drift + diffusion * Z
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(S0)
log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)
return np.exp(log_prices)
# Compare normal vs fat-tailed simulations
paths_normal = simulate_gbm(S0, mu_annual, sigma_annual, T, n_steps, n_paths, seed=42)
paths_fat = simulate_gbm_fat_tails(S0, mu_annual, sigma_annual, T, n_steps, n_paths, df=5, seed=42)
# Compare terminal distributions
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(paths_normal[-1, :], bins=50, density=True, alpha=0.5, label='Normal', color='steelblue')
ax.hist(paths_fat[-1, :], bins=50, density=True, alpha=0.5, label='Fat-tailed (df=5)', color='orange')
ax.axvline(S0, color='black', linestyle='--', linewidth=2, label='Initial Price')
ax.set_xlabel('Terminal Price ($)')
ax.set_ylabel('Density')
ax.set_title('Terminal Price Distribution: Normal vs Fat-Tailed')
ax.legend()
plt.show()
# Compare tail statistics
print("Tail Statistics Comparison:")
print(f" 1st percentile - Normal: ${np.percentile(paths_normal[-1], 1):.2f}")
print(f" 1st percentile - Fat-tailed: ${np.percentile(paths_fat[-1], 1):.2f}")
Exercise 10.2: Monte Carlo VaR Calculator (Guided)
Your Task: Use Monte Carlo simulation to estimate 1-day VaR at different confidence levels.
Fill in the blanks to complete the VaR calculator:
Click to reveal solution
def monte_carlo_var(S0: float, mu: float, sigma: float,
n_simulations: int = 10000,
confidence: float = 0.95) -> dict:
"""
Calculate 1-day VaR using Monte Carlo simulation.
"""
rng = np.random.default_rng(seed=42)
# Calculate daily parameters
daily_mu = mu / 252
daily_sigma = sigma / np.sqrt(252)
# Generate 1-day returns using normal distribution
sim_returns = rng.normal(daily_mu, daily_sigma, n_simulations)
# Calculate VaR as negative percentile of returns
alpha = 1 - confidence
var = -np.percentile(sim_returns, alpha * 100)
# Calculate Expected Shortfall
threshold = np.percentile(sim_returns, alpha * 100)
es = -np.mean(sim_returns[sim_returns <= threshold])
return {
'var': var,
'var_dollar': var * S0,
'es': es,
'es_dollar': es * S0
}
# Test
result = monte_carlo_var(S0, mu_annual, sigma_annual, confidence=0.95)
print(f"95% VaR: {result['var']*100:.2f}% (${result['var_dollar']:.2f})")
print(f"95% ES: {result['es']*100:.2f}% (${result['es_dollar']:.2f})")
Section 10.3: Correlated Simulations
Real portfolios contain multiple assets that are correlated. We need to simulate paths that preserve these correlation structures.
In this section, you will learn: - Cholesky decomposition for correlation - Simulating correlated multi-asset paths - Portfolio value simulation
10.3.1 Cholesky Decomposition
To generate correlated random variables, we use the Cholesky decomposition:
$$\Sigma = LL^T$$
If $Z$ is a vector of independent standard normals, then $X = LZ$ has the desired correlation structure.
# Calculate correlation matrix from historical data
corr_matrix = returns.corr()
print("Historical Correlation Matrix:")
print(corr_matrix.round(3))
# Cholesky decomposition
L = cholesky(corr_matrix, lower=True)
print(f"\nCholesky matrix L (lower triangular):")
print(pd.DataFrame(L, index=corr_matrix.index, columns=corr_matrix.columns).round(3))
# Verify: L @ L.T should equal correlation matrix
print(f"\nVerification (L @ L.T):")
print(pd.DataFrame(L @ L.T, index=corr_matrix.index, columns=corr_matrix.columns).round(3))
def simulate_correlated_gbm(S0_vec: list, mu_vec: list, sigma_vec: list,
corr_matrix: np.ndarray, T: float,
n_steps: int, n_paths: int, seed: int = None) -> dict:
"""
Simulate correlated multi-asset price paths.
Returns:
Dictionary mapping asset index to price paths array
"""
rng = np.random.default_rng(seed)
n_assets = len(S0_vec)
dt = T / n_steps
# Cholesky decomposition
L = cholesky(corr_matrix, lower=True)
# Generate independent normals
Z = rng.standard_normal((n_steps, n_assets, n_paths))
# Apply correlation structure
corr_Z = np.zeros_like(Z)
for t in range(n_steps):
corr_Z[t] = L @ Z[t]
# Calculate price paths for each asset
paths = {}
for i in range(n_assets):
drift = (mu_vec[i] - 0.5 * sigma_vec[i]**2) * dt
diffusion = sigma_vec[i] * np.sqrt(dt)
log_returns = drift + diffusion * corr_Z[:, i, :]
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(S0_vec[i])
log_prices[1:] = np.log(S0_vec[i]) + np.cumsum(log_returns, axis=0)
paths[i] = np.exp(log_prices)
return paths
# Get parameters for all assets
assets = returns.columns.tolist()
S0_vec = [float(prices[a].iloc[-1]) for a in assets]
mu_vec = [float(returns[a].mean() * 252) for a in assets]
sigma_vec = [float(returns[a].std() * np.sqrt(252)) for a in assets]
print("Asset Parameters:")
for i, asset in enumerate(assets):
print(f"{asset}: S0=${S0_vec[i]:.2f}, mu={mu_vec[i]*100:.1f}%, sigma={sigma_vec[i]*100:.1f}%")
# Simulate correlated paths
T = 1
n_steps = 252
n_paths = 5000
corr_paths = simulate_correlated_gbm(
S0_vec, mu_vec, sigma_vec, corr_matrix.values,
T, n_steps, n_paths, seed=42
)
# Verify correlation is preserved
sim_returns_all = {}
for i, asset in enumerate(assets):
path_returns = np.diff(np.log(corr_paths[i]), axis=0)
sim_returns_all[asset] = path_returns.flatten()
sim_returns_df = pd.DataFrame(sim_returns_all)
sim_corr = sim_returns_df.corr()
print("Simulated Correlation Matrix:")
print(sim_corr.round(3))
print("\nDifference from Original (should be near zero):")
print((sim_corr - corr_matrix).round(3))
10.3.2 Portfolio Simulation
def simulate_portfolio_value(corr_paths: dict, weights: dict,
assets: list, initial_value: float = 100000) -> np.ndarray:
"""
Calculate portfolio value paths from correlated asset simulations.
"""
n_steps, n_paths = corr_paths[0].shape
port_value = np.ones((n_steps, n_paths))
for i, asset in enumerate(assets):
if asset in weights:
normalized = corr_paths[i] / corr_paths[i][0, :]
port_value += weights[asset] * (normalized - 1)
return port_value * initial_value
# Define portfolio weights
portfolio_weights = {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1}
initial_value = 100000
# Calculate portfolio paths
port_paths = simulate_portfolio_value(corr_paths, portfolio_weights, assets, initial_value)
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: Portfolio value paths
ax1 = axes[0]
time_grid = np.linspace(0, T, n_steps + 1)
for j in range(min(100, n_paths)):
ax1.plot(time_grid, port_paths[:, j], alpha=0.05, color='steelblue', linewidth=0.5)
ax1.fill_between(time_grid,
np.percentile(port_paths, 5, axis=1),
np.percentile(port_paths, 95, axis=1),
alpha=0.3, color='steelblue', label='5th-95th percentile')
ax1.plot(time_grid, np.median(port_paths, axis=1), 'k-', linewidth=2, label='Median')
ax1.axhline(initial_value, color='red', linestyle='--', label='Initial Value')
ax1.set_xlabel('Time (years)')
ax1.set_ylabel('Portfolio Value ($)')
ax1.set_title(f'Portfolio Simulation ({n_paths} paths)')
ax1.legend(loc='upper left')
# Right: Terminal value distribution
ax2 = axes[1]
terminal_values = port_paths[-1, :]
ax2.hist(terminal_values / 1000, bins=50, density=True, alpha=0.7, color='steelblue')
ax2.axvline(initial_value / 1000, color='red', linestyle='--', linewidth=2, label=f'Initial: ${initial_value/1000:.0f}k')
ax2.axvline(np.mean(terminal_values) / 1000, color='orange', linewidth=2, label=f'Mean: ${np.mean(terminal_values)/1000:.0f}k')
ax2.set_xlabel('Terminal Value ($k)')
ax2.set_ylabel('Density')
ax2.set_title('Distribution of 1-Year Portfolio Values')
ax2.legend()
plt.tight_layout()
plt.show()
# Summary
print(f"\nPortfolio Simulation Summary (Initial: ${initial_value:,.0f})")
print(f" Expected Value: ${np.mean(terminal_values):,.0f}")
print(f" Probability of Loss: {np.mean(terminal_values < initial_value)*100:.1f}%")
print(f" Probability of >20% Gain: {np.mean(terminal_values > initial_value*1.2)*100:.1f}%")
Exercise 10.3: Portfolio Risk Comparison (Guided)
Your Task: Compare risk metrics for different portfolio allocations using Monte Carlo.
Fill in the blanks to complete the comparison:
Click to reveal solution
def calculate_portfolio_risk_metrics(corr_paths: dict, weights: dict,
assets: list, initial_value: float) -> dict:
"""
Calculate risk metrics for a portfolio from Monte Carlo simulation.
"""
port_paths = simulate_portfolio_value(corr_paths, weights, assets, initial_value)
terminal_values = port_paths[-1, :]
# Calculate terminal returns as percentage
terminal_returns = (terminal_values / initial_value - 1)
# Calculate VaR at 95%
var_95 = -np.percentile(terminal_returns, 5)
# Calculate Expected Shortfall
threshold = np.percentile(terminal_returns, 5)
es_95 = -np.mean(terminal_returns[terminal_returns <= threshold])
return {
'mean_return': np.mean(terminal_returns),
'volatility': np.std(terminal_returns),
'var_95': var_95,
'es_95': es_95,
'prob_loss': np.mean(terminal_returns < 0)
}
# Test with two portfolios
aggressive = {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0}
defensive = {'SPY': 0.2, 'QQQ': 0.0, 'TLT': 0.4, 'GLD': 0.4}
agg_metrics = calculate_portfolio_risk_metrics(corr_paths, aggressive, assets, initial_value)
def_metrics = calculate_portfolio_risk_metrics(corr_paths, defensive, assets, initial_value)
print(f"Aggressive - Return: {agg_metrics['mean_return']*100:.1f}%, VaR: {agg_metrics['var_95']*100:.1f}%")
print(f"Defensive - Return: {def_metrics['mean_return']*100:.1f}%, VaR: {def_metrics['var_95']*100:.1f}%")
Section 10.4: Applications
In this section, you will learn: - Option pricing with Monte Carlo - Retirement planning simulations - Practical considerations
10.4.1 Option Pricing
Monte Carlo can price path-dependent and exotic options that lack analytical solutions.
def monte_carlo_european_option(S0: float, K: float, T: float, r: float,
sigma: float, option_type: str = 'call',
n_paths: int = 100000, seed: int = None) -> dict:
"""
Price a European option using Monte Carlo simulation.
Args:
S0: Current stock price
K: Strike price
T: Time to expiration (years)
r: Risk-free rate
sigma: Volatility
option_type: 'call' or 'put'
Returns:
Dictionary with price and statistics
"""
rng = np.random.default_rng(seed)
# Simulate terminal stock prices under risk-neutral measure
Z = rng.standard_normal(n_paths)
ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
# Calculate payoffs
if option_type == 'call':
payoffs = np.maximum(ST - K, 0)
else:
payoffs = np.maximum(K - ST, 0)
# Discount expected payoff
discounted_payoffs = np.exp(-r * T) * payoffs
price = np.mean(discounted_payoffs)
std_error = np.std(discounted_payoffs) / np.sqrt(n_paths)
return {
'price': price,
'std_error': std_error,
'ci_95': (price - 1.96 * std_error, price + 1.96 * std_error)
}
def black_scholes(S0: float, K: float, T: float, r: float,
sigma: float, option_type: str = 'call') -> float:
"""Analytical Black-Scholes price for comparison."""
d1 = (np.log(S0/K) + (r + 0.5*sigma**2)*T) / (sigma*np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)
if option_type == 'call':
price = S0 * stats.norm.cdf(d1) - K * np.exp(-r*T) * stats.norm.cdf(d2)
else:
price = K * np.exp(-r*T) * stats.norm.cdf(-d2) - S0 * stats.norm.cdf(-d1)
return price
# Price an option
S0_opt = 100
K = 105
T_opt = 0.5
r = 0.05
sigma_opt = 0.20
mc_result = monte_carlo_european_option(S0_opt, K, T_opt, r, sigma_opt, 'call', seed=42)
bs_price = black_scholes(S0_opt, K, T_opt, r, sigma_opt, 'call')
print("European Call Option Pricing")
print(f" Black-Scholes Price: ${bs_price:.4f}")
print(f" Monte Carlo Price: ${mc_result['price']:.4f}")
print(f" MC Standard Error: ${mc_result['std_error']:.4f}")
print(f" Difference: ${abs(mc_result['price'] - bs_price):.4f}")
10.4.2 Retirement Planning
def retirement_simulation(initial_savings: float, annual_contribution: float,
years_to_retirement: int, years_in_retirement: int,
annual_withdrawal: float, mu: float, sigma: float,
n_paths: int = 10000, seed: int = None) -> dict:
"""
Simulate retirement outcomes.
"""
rng = np.random.default_rng(seed)
total_years = years_to_retirement + years_in_retirement
# Generate annual returns
annual_returns = rng.normal(mu, sigma, (total_years, n_paths))
# Initialize wealth
wealth = np.zeros((total_years + 1, n_paths))
wealth[0, :] = initial_savings
# Accumulation phase
for year in range(years_to_retirement):
wealth[year + 1, :] = wealth[year, :] * (1 + annual_returns[year, :]) + annual_contribution
# Distribution phase
for year in range(years_to_retirement, total_years):
new_wealth = wealth[year, :] * (1 + annual_returns[year, :]) - annual_withdrawal
wealth[year + 1, :] = np.maximum(new_wealth, 0)
final_wealth = wealth[-1, :]
return {
'wealth_paths': wealth,
'prob_success': np.mean(final_wealth > 0),
'median_retirement': np.median(wealth[years_to_retirement, :]),
'median_final': np.median(final_wealth[final_wealth > 0]) if np.sum(final_wealth > 0) > 0 else 0
}
# Run simulation
result = retirement_simulation(
initial_savings=100000,
annual_contribution=20000,
years_to_retirement=25,
years_in_retirement=30,
annual_withdrawal=80000,
mu=0.07,
sigma=0.15,
seed=42
)
print(f"Retirement Planning Results")
print(f" Probability of Success: {result['prob_success']*100:.1f}%")
print(f" Median Wealth at Retirement: ${result['median_retirement']:,.0f}")
print(f" Median Final Wealth (if successful): ${result['median_final']:,.0f}")
Exercise 10.4: Option Price Sensitivity (Open-ended)
Your Task:
Build a function that: - Calculates option prices for a range of strike prices - Uses Monte Carlo simulation - Returns a DataFrame with strike, call price, and put price - Includes confidence intervals
Your implementation:
Click to reveal solution
def option_price_sensitivity(S0: float, T: float, r: float, sigma: float,
strikes: list, n_paths: int = 50000) -> pd.DataFrame:
"""
Calculate option prices for multiple strikes using Monte Carlo.
Args:
S0: Current stock price
T: Time to expiration
r: Risk-free rate
sigma: Volatility
strikes: List of strike prices
Returns:
DataFrame with strikes and option prices
"""
rng = np.random.default_rng(seed=42)
# Simulate terminal prices once
Z = rng.standard_normal(n_paths)
ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
discount = np.exp(-r * T)
results = []
for K in strikes:
# Call payoffs
call_payoffs = np.maximum(ST - K, 0) * discount
call_price = np.mean(call_payoffs)
call_se = np.std(call_payoffs) / np.sqrt(n_paths)
# Put payoffs
put_payoffs = np.maximum(K - ST, 0) * discount
put_price = np.mean(put_payoffs)
put_se = np.std(put_payoffs) / np.sqrt(n_paths)
results.append({
'Strike': K,
'Call Price': call_price,
'Call CI': f"{call_price:.3f} ± {1.96*call_se:.3f}",
'Put Price': put_price,
'Put CI': f"{put_price:.3f} ± {1.96*put_se:.3f}",
'Moneyness': K / S0
})
return pd.DataFrame(results)
# Test
strikes = [90, 95, 100, 105, 110]
sensitivity_df = option_price_sensitivity(S0=100, T=0.5, r=0.05, sigma=0.20, strikes=strikes)
print(sensitivity_df.to_string(index=False))
Exercise 10.5: Retirement Scenario Analysis (Open-ended)
Your Task:
Build a function that: - Takes multiple withdrawal rate scenarios - Runs retirement simulations for each - Returns success probability for each scenario - Identifies the safe withdrawal rate (>95% success)
Your implementation:
Click to reveal solution
def retirement_scenario_analysis(initial_savings: float, annual_contribution: float,
years_to_retirement: int, years_in_retirement: int,
withdrawal_rates: list, mu: float, sigma: float,
n_paths: int = 5000) -> pd.DataFrame:
"""
Analyze multiple withdrawal rate scenarios.
Args:
withdrawal_rates: List of withdrawal rates as fraction of initial retirement wealth
Returns:
DataFrame with scenario results
"""
rng = np.random.default_rng(seed=42)
total_years = years_to_retirement + years_in_retirement
# Generate returns once
annual_returns = rng.normal(mu, sigma, (total_years, n_paths))
# Accumulation phase (same for all scenarios)
wealth_at_retirement = np.zeros(n_paths)
wealth = initial_savings * np.ones(n_paths)
for year in range(years_to_retirement):
wealth = wealth * (1 + annual_returns[year, :]) + annual_contribution
wealth_at_retirement = wealth.copy()
results = []
for rate in withdrawal_rates:
# Reset to retirement wealth
wealth = wealth_at_retirement.copy()
# Distribution phase with fixed withdrawal rate
annual_withdrawal = wealth_at_retirement * rate
for year in range(years_to_retirement, total_years):
wealth = wealth * (1 + annual_returns[year, :]) - annual_withdrawal
wealth = np.maximum(wealth, 0)
success_rate = np.mean(wealth > 0)
median_final = np.median(wealth[wealth > 0]) if np.sum(wealth > 0) > 0 else 0
results.append({
'Withdrawal Rate': f"{rate*100:.1f}%",
'Annual Withdrawal': f"${np.median(annual_withdrawal):,.0f}",
'Success Rate': f"{success_rate*100:.1f}%",
'Median Final': f"${median_final:,.0f}",
'Safe': '✓' if success_rate >= 0.95 else ''
})
return pd.DataFrame(results)
# Test
withdrawal_rates = [0.03, 0.035, 0.04, 0.045, 0.05, 0.055, 0.06]
scenarios = retirement_scenario_analysis(
initial_savings=100000,
annual_contribution=20000,
years_to_retirement=25,
years_in_retirement=30,
withdrawal_rates=withdrawal_rates,
mu=0.07,
sigma=0.15
)
print(scenarios.to_string(index=False))
Exercise 10.6: Complete Monte Carlo Engine (Open-ended)
Your Task:
Build a comprehensive Monte Carlo simulation class that includes: - Single and multi-asset GBM simulation - Normal and fat-tailed distributions - Portfolio simulation with custom weights - Risk metrics calculation (VaR, ES) - Option pricing capabilities
Your implementation:
Click to reveal solution
class MonteCarloEngine:
"""
Comprehensive Monte Carlo simulation engine for financial analysis.
Features:
- Single and multi-asset price simulation
- Normal and fat-tailed distributions
- Portfolio simulation
- Risk metrics calculation
- Option pricing
"""
def __init__(self, seed: int = None):
"""Initialize with optional seed for reproducibility."""
self.seed = seed
self.rng = np.random.default_rng(seed)
def reset_seed(self):
"""Reset random number generator."""
self.rng = np.random.default_rng(self.seed)
def simulate_gbm(self, S0: float, mu: float, sigma: float, T: float,
n_steps: int, n_paths: int, fat_tails: bool = False,
df: int = 5) -> np.ndarray:
"""Simulate single-asset price paths."""
dt = T / n_steps
drift = (mu - 0.5 * sigma**2) * dt
if fat_tails:
scale = np.sqrt((df - 2) / df)
diffusion = sigma * np.sqrt(dt) * scale
Z = self.rng.standard_t(df, size=(n_steps, n_paths))
else:
diffusion = sigma * np.sqrt(dt)
Z = self.rng.standard_normal((n_steps, n_paths))
log_returns = drift + diffusion * Z
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(S0)
log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)
return np.exp(log_prices)
def simulate_correlated(self, S0_vec: list, mu_vec: list, sigma_vec: list,
corr_matrix: np.ndarray, T: float,
n_steps: int, n_paths: int) -> dict:
"""Simulate correlated multi-asset paths."""
n_assets = len(S0_vec)
dt = T / n_steps
L = cholesky(corr_matrix, lower=True)
Z = self.rng.standard_normal((n_steps, n_assets, n_paths))
corr_Z = np.zeros_like(Z)
for t in range(n_steps):
corr_Z[t] = L @ Z[t]
paths = {}
for i in range(n_assets):
drift = (mu_vec[i] - 0.5 * sigma_vec[i]**2) * dt
diffusion = sigma_vec[i] * np.sqrt(dt)
log_returns = drift + diffusion * corr_Z[:, i, :]
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(S0_vec[i])
log_prices[1:] = np.log(S0_vec[i]) + np.cumsum(log_returns, axis=0)
paths[i] = np.exp(log_prices)
return paths
def calculate_risk_metrics(self, returns: np.ndarray,
confidence: float = 0.95) -> dict:
"""Calculate VaR and ES from simulated returns."""
alpha = 1 - confidence
var = -np.percentile(returns, alpha * 100)
threshold = np.percentile(returns, alpha * 100)
es = -np.mean(returns[returns <= threshold])
return {'var': var, 'es': es, 'confidence': confidence}
def price_european_option(self, S0: float, K: float, T: float,
r: float, sigma: float, option_type: str = 'call',
n_paths: int = 100000) -> dict:
"""Price European option using Monte Carlo."""
Z = self.rng.standard_normal(n_paths)
ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
if option_type == 'call':
payoffs = np.maximum(ST - K, 0)
else:
payoffs = np.maximum(K - ST, 0)
discounted = np.exp(-r * T) * payoffs
price = np.mean(discounted)
std_error = np.std(discounted) / np.sqrt(n_paths)
return {'price': price, 'std_error': std_error,
'ci_95': (price - 1.96*std_error, price + 1.96*std_error)}
def summary_stats(self, paths: np.ndarray) -> dict:
"""Calculate summary statistics for simulation paths."""
terminal = paths[-1, :]
initial = paths[0, 0]
returns = (terminal - initial) / initial
return {
'mean_terminal': np.mean(terminal),
'median_terminal': np.median(terminal),
'mean_return': np.mean(returns),
'volatility': np.std(returns),
'percentile_5': np.percentile(terminal, 5),
'percentile_95': np.percentile(terminal, 95),
'prob_gain': np.mean(terminal > initial)
}
# Demo
mc = MonteCarloEngine(seed=42)
# Single asset simulation
paths = mc.simulate_gbm(S0=100, mu=0.08, sigma=0.20, T=1, n_steps=252, n_paths=10000)
stats = mc.summary_stats(paths)
print(f"Mean terminal: ${stats['mean_terminal']:.2f}")
print(f"Mean return: {stats['mean_return']*100:.1f}%")
print(f"Prob of gain: {stats['prob_gain']*100:.1f}%")
# Option pricing
mc.reset_seed()
option = mc.price_european_option(S0=100, K=100, T=0.5, r=0.05, sigma=0.20)
print(f"\nOption price: ${option['price']:.4f} ± ${option['std_error']:.4f}")
Module Project: Monte Carlo Simulator
Put together everything you've learned!
Your Challenge:
Build a complete Monte Carlo simulation system that includes: 1. Multi-asset correlated price simulation with configurable distribution (normal or Student-t) 2. Portfolio value projection with custom asset weights 3. Comprehensive risk metrics (VaR, ES, probability of loss) 4. Option pricing with Greeks estimation 5. Summary report generation with visualization
# YOUR CODE HERE - Module Project
Click to reveal solution
class MonteCarloSimulator:
"""
Complete Monte Carlo simulation system for portfolio analysis.
"""
def __init__(self, seed: int = 42):
self.seed = seed
self.rng = np.random.default_rng(seed)
self.results = {}
def simulate_portfolio(self, assets: dict, corr_matrix: np.ndarray,
weights: dict, T: float = 1, n_steps: int = 252,
n_paths: int = 10000, initial_value: float = 100000,
fat_tails: bool = False, df: int = 5) -> dict:
"""
Simulate portfolio value paths.
Args:
assets: Dict of {name: {'S0': price, 'mu': return, 'sigma': vol}}
corr_matrix: Correlation matrix
weights: Dict of {name: weight}
"""
asset_names = list(assets.keys())
n_assets = len(asset_names)
dt = T / n_steps
# Cholesky decomposition
L = cholesky(corr_matrix, lower=True)
# Generate correlated shocks
if fat_tails:
scale = np.sqrt((df - 2) / df)
Z_raw = self.rng.standard_t(df, size=(n_steps, n_assets, n_paths))
else:
scale = 1.0
Z_raw = self.rng.standard_normal((n_steps, n_assets, n_paths))
Z = np.zeros_like(Z_raw)
for t in range(n_steps):
Z[t] = L @ Z_raw[t]
# Simulate each asset
asset_paths = {}
for i, name in enumerate(asset_names):
params = assets[name]
drift = (params['mu'] - 0.5 * params['sigma']**2) * dt
diffusion = params['sigma'] * np.sqrt(dt) * scale
log_returns = drift + diffusion * Z[:, i, :]
log_prices = np.zeros((n_steps + 1, n_paths))
log_prices[0] = np.log(params['S0'])
log_prices[1:] = np.log(params['S0']) + np.cumsum(log_returns, axis=0)
asset_paths[name] = np.exp(log_prices)
# Calculate portfolio value
port_value = np.ones((n_steps + 1, n_paths))
for name in asset_names:
if name in weights:
normalized = asset_paths[name] / asset_paths[name][0, :]
port_value += weights[name] * (normalized - 1)
port_value *= initial_value
# Store results
self.results = {
'asset_paths': asset_paths,
'portfolio_paths': port_value,
'initial_value': initial_value,
'terminal_values': port_value[-1, :]
}
return self.results
def calculate_risk_metrics(self, confidence: float = 0.95) -> dict:
"""Calculate comprehensive risk metrics."""
terminal = self.results['terminal_values']
initial = self.results['initial_value']
returns = (terminal - initial) / initial
alpha = 1 - confidence
var = -np.percentile(returns, alpha * 100)
threshold = np.percentile(returns, alpha * 100)
es = -np.mean(returns[returns <= threshold])
return {
'mean_return': np.mean(returns),
'volatility': np.std(returns),
f'var_{int(confidence*100)}': var,
f'es_{int(confidence*100)}': es,
'prob_loss': np.mean(returns < 0),
'prob_10pct_loss': np.mean(returns < -0.10),
'prob_20pct_gain': np.mean(returns > 0.20)
}
def generate_report(self) -> None:
"""Generate summary report with visualization."""
metrics = self.calculate_risk_metrics()
print("="*60)
print("MONTE CARLO SIMULATION REPORT")
print("="*60)
print(f"\nInitial Investment: ${self.results['initial_value']:,.0f}")
print(f"\nRETURN STATISTICS")
print(f" Expected Return: {metrics['mean_return']*100:.1f}%")
print(f" Volatility: {metrics['volatility']*100:.1f}%")
print(f"\nRISK METRICS (95%)")
print(f" Value at Risk: {metrics['var_95']*100:.1f}%")
print(f" Expected Shortfall: {metrics['es_95']*100:.1f}%")
print(f"\nPROBABILITIES")
print(f" Probability of Loss: {metrics['prob_loss']*100:.1f}%")
print(f" Probability of >10% Loss: {metrics['prob_10pct_loss']*100:.1f}%")
print(f" Probability of >20% Gain: {metrics['prob_20pct_gain']*100:.1f}%")
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Portfolio paths
paths = self.results['portfolio_paths']
n_steps = paths.shape[0]
time_grid = np.linspace(0, 1, n_steps)
for i in range(min(100, paths.shape[1])):
axes[0].plot(time_grid, paths[:, i], alpha=0.05, color='steelblue')
axes[0].fill_between(time_grid,
np.percentile(paths, 5, axis=1),
np.percentile(paths, 95, axis=1),
alpha=0.3, color='steelblue')
axes[0].plot(time_grid, np.median(paths, axis=1), 'k-', linewidth=2)
axes[0].axhline(self.results['initial_value'], color='red', linestyle='--')
axes[0].set_xlabel('Time (years)')
axes[0].set_ylabel('Portfolio Value ($)')
axes[0].set_title('Portfolio Value Simulation')
# Terminal distribution
terminal = self.results['terminal_values']
axes[1].hist(terminal/1000, bins=50, density=True, alpha=0.7, color='steelblue')
axes[1].axvline(self.results['initial_value']/1000, color='red', linestyle='--', label='Initial')
axes[1].axvline(np.mean(terminal)/1000, color='orange', linewidth=2, label='Mean')
axes[1].set_xlabel('Terminal Value ($k)')
axes[1].set_ylabel('Density')
axes[1].set_title('Terminal Value Distribution')
axes[1].legend()
plt.tight_layout()
plt.show()
# Demo
simulator = MonteCarloSimulator(seed=42)
# Define assets
assets_config = {
'SPY': {'S0': 450, 'mu': 0.10, 'sigma': 0.18},
'QQQ': {'S0': 380, 'mu': 0.12, 'sigma': 0.25},
'TLT': {'S0': 100, 'mu': 0.04, 'sigma': 0.15},
'GLD': {'S0': 180, 'mu': 0.05, 'sigma': 0.12}
}
corr = np.array([
[1.00, 0.85, -0.30, 0.05],
[0.85, 1.00, -0.25, 0.00],
[-0.30, -0.25, 1.00, 0.25],
[0.05, 0.00, 0.25, 1.00]
])
weights_config = {'SPY': 0.40, 'QQQ': 0.20, 'TLT': 0.30, 'GLD': 0.10}
# Run simulation
simulator.simulate_portfolio(assets_config, corr, weights_config,
T=1, n_paths=10000, initial_value=100000)
simulator.generate_report()
Key Takeaways
What You Learned
1. Random Number Generation
- Use seeds for reproducibility and audit trails
- Modern NumPy Generator objects provide better control
- Different distributions model different phenomena
2. Price Path Simulation
- GBM is the standard model but assumes normal returns
- Fat-tailed distributions better capture extreme events
- Terminal price distribution is log-normal (positively skewed)
3. Correlated Simulations
- Cholesky decomposition transforms independent normals to correlated
- Correlation structure is preserved in multi-asset simulations
- Portfolio risk depends on both individual volatilities and correlations
4. Applications
- Monte Carlo can price complex options analytically intractable
- Portfolio projections show probability distributions, not point estimates
- Retirement planning benefits from probability-based thinking
Coming Up Next
In Module 11: Performance Attribution, we'll explore: - Decomposing portfolio returns into components - Brinson attribution (allocation vs selection) - Factor-based attribution - Risk attribution and budgeting
Congratulations on completing Module 10!
Module 11: Performance Attribution
Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics
Learning Objectives
By the end of this module, you will be able to:
- Decompose portfolio returns using Brinson attribution
- Apply factor-based attribution using regression analysis
- Calculate risk attribution and contribution metrics
- Build comprehensive attribution reports
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 4-6: Portfolio Theory, Module 9: Factor Models |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Module 11: Performance Attribution - Ready!')
Load Data
# Download sector ETF data for attribution analysis
sector_etfs = {
'XLK': 'Technology', 'XLF': 'Financials', 'XLV': 'Healthcare',
'XLE': 'Energy', 'XLY': 'Consumer Disc', 'XLP': 'Consumer Staples',
'XLI': 'Industrials', 'XLU': 'Utilities', 'XLB': 'Materials'
}
tickers = list(sector_etfs.keys()) + ['SPY']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
prices = data['Close']
else:
prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']
returns = prices.pct_change().dropna()
print(f'Data loaded: {len(returns)} days')
print(f'Sectors: {len(sector_etfs)}')
Section 11.1: Attribution Basics
Performance attribution answers the crucial question: "Why did the portfolio perform the way it did?"
In this section, you will learn: - Why attribution matters for portfolio management - Active return and tracking error - Information ratio for skill measurement
11.1.1 Understanding Active Management
Active return = Portfolio return - Benchmark return
# Define portfolio and benchmark weights
benchmark_weights = {
'XLK': 0.28, 'XLF': 0.13, 'XLV': 0.13, 'XLE': 0.04,
'XLY': 0.10, 'XLP': 0.07, 'XLI': 0.09, 'XLU': 0.03, 'XLB': 0.03
}
portfolio_weights = {
'XLK': 0.35, 'XLF': 0.08, 'XLV': 0.15, 'XLE': 0.02,
'XLY': 0.12, 'XLP': 0.05, 'XLI': 0.10, 'XLU': 0.03, 'XLB': 0.05
}
# Calculate returns
sector_tickers = list(sector_etfs.keys())
sector_returns = returns[sector_tickers]
port_weights = np.array([portfolio_weights[t] for t in sector_tickers])
bench_weights = np.array([benchmark_weights[t] for t in sector_tickers])
portfolio_returns = (sector_returns * port_weights).sum(axis=1)
benchmark_returns = returns['SPY']
active_returns = portfolio_returns - benchmark_returns
# Performance summary
print("Performance Summary (Annualized)")
print("="*50)
print(f"Portfolio Return: {portfolio_returns.mean() * 252 * 100:.2f}%")
print(f"Benchmark Return: {benchmark_returns.mean() * 252 * 100:.2f}%")
print(f"Active Return: {active_returns.mean() * 252 * 100:.2f}%")
print(f"\nTracking Error: {active_returns.std() * np.sqrt(252) * 100:.2f}%")
print(f"Information Ratio: {(active_returns.mean() * 252) / (active_returns.std() * np.sqrt(252)):.2f}")
Exercise 11.1: Active Return Analysis (Guided)
Your Task: Calculate comprehensive active return statistics including hit rate and win/loss ratio.
Fill in the blanks to complete the analysis:
Click to reveal solution
def analyze_active_returns(portfolio_ret: pd.Series, benchmark_ret: pd.Series) -> dict:
"""
Analyze active return characteristics.
"""
active = portfolio_ret - benchmark_ret
# Calculate the mean of active returns
mean_active = active.mean()
# Calculate hit rate (proportion of positive active returns)
hit_rate = (active > 0).mean()
# Calculate average win and average loss
avg_win = active[active > 0].mean()
avg_loss = active[active < 0].mean()
# Calculate win/loss ratio
win_loss_ratio = abs(avg_win) / abs(avg_loss)
return {
'mean_active': mean_active * 252,
'tracking_error': active.std() * np.sqrt(252),
'information_ratio': (mean_active * 252) / (active.std() * np.sqrt(252)),
'hit_rate': hit_rate,
'win_loss_ratio': win_loss_ratio
}
# Test
metrics = analyze_active_returns(portfolio_returns, benchmark_returns)
print(f"Hit Rate: {metrics['hit_rate']*100:.1f}%")
print(f"Win/Loss Ratio: {metrics['win_loss_ratio']:.2f}")
print(f"Information Ratio: {metrics['information_ratio']:.2f}")
Section 11.2: Brinson Attribution
The Brinson model decomposes active return into allocation, selection, and interaction effects.
In this section, you will learn: - Allocation effect: Value from sector weighting decisions - Selection effect: Value from security selection within sectors - Interaction effect: Combined impact
def brinson_attribution(portfolio_weights: dict, benchmark_weights: dict,
portfolio_returns: dict, benchmark_sector_returns: dict,
benchmark_total_return: float) -> pd.DataFrame:
"""
Calculate Brinson attribution for a single period.
Allocation Effect: (wp - wb) * (rb - rb_total)
Selection Effect: wb * (rp - rb)
Interaction Effect: (wp - wb) * (rp - rb)
"""
results = []
for sector in portfolio_weights.keys():
wp = portfolio_weights[sector]
wb = benchmark_weights[sector]
rp = portfolio_returns[sector]
rb = benchmark_sector_returns[sector]
rb_total = benchmark_total_return
allocation = (wp - wb) * (rb - rb_total)
selection = wb * (rp - rb)
interaction = (wp - wb) * (rp - rb)
total = allocation + selection + interaction
results.append({
'Sector': sector,
'Active Weight': wp - wb,
'Allocation': allocation,
'Selection': selection,
'Interaction': interaction,
'Total': total
})
return pd.DataFrame(results)
# Calculate cumulative returns for the period
cum_returns = (1 + sector_returns).prod() - 1
benchmark_cum = (1 + benchmark_returns).prod() - 1
# Add simulated selection alpha
np.random.seed(42)
selection_alpha = {ticker: np.random.normal(0, 0.03) for ticker in sector_tickers}
portfolio_sector_returns = {t: float(cum_returns[t]) + selection_alpha[t] for t in sector_tickers}
benchmark_sector_returns = {t: float(cum_returns[t]) for t in sector_tickers}
# Run attribution
attribution_df = brinson_attribution(
portfolio_weights, benchmark_weights,
portfolio_sector_returns, benchmark_sector_returns,
float(benchmark_cum)
)
print("Brinson Attribution Results")
print("="*70)
print(f"Total Allocation: {attribution_df['Allocation'].sum()*100:.2f}%")
print(f"Total Selection: {attribution_df['Selection'].sum()*100:.2f}%")
print(f"Total Interaction: {attribution_df['Interaction'].sum()*100:.2f}%")
print(f"Total Active: {attribution_df['Total'].sum()*100:.2f}%")
Exercise 11.2: Monthly Brinson Attribution (Guided)
Your Task: Calculate Brinson attribution on a monthly basis to track attribution over time.
Fill in the blanks to complete the time-series attribution:
Click to reveal solution
def monthly_brinson(sector_returns: pd.DataFrame, benchmark_returns: pd.Series,
portfolio_weights: dict, benchmark_weights: dict) -> pd.DataFrame:
"""
Calculate monthly Brinson attribution.
"""
# Resample to monthly returns using compound formula
monthly_sector = sector_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
monthly_bench = benchmark_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
results = []
for date in monthly_sector.index:
sector_ret = monthly_sector.loc[date].to_dict()
bench_ret = float(monthly_bench.loc[date])
# Simulate selection alpha
np.random.seed(int(date.timestamp()) % 10000)
port_ret = {t: sector_ret[t] + np.random.normal(0, 0.01) for t in sector_ret}
# Calculate allocation effect
allocation = sum([
(portfolio_weights[s] - benchmark_weights[s]) * (sector_ret[s] - bench_ret)
for s in portfolio_weights
])
# Calculate selection effect
selection = sum([
benchmark_weights[s] * (port_ret[s] - sector_ret[s])
for s in portfolio_weights
])
results.append({
'Date': date,
'Allocation': allocation,
'Selection': selection,
'Total': allocation + selection
})
return pd.DataFrame(results).set_index('Date')
# Test
monthly_attr = monthly_brinson(sector_returns, benchmark_returns, portfolio_weights, benchmark_weights)
print(f"Mean Monthly Allocation: {monthly_attr['Allocation'].mean()*100:.3f}%")
print(f"Mean Monthly Selection: {monthly_attr['Selection'].mean()*100:.3f}%")
print(f"Allocation Hit Rate: {(monthly_attr['Allocation'] > 0).mean()*100:.1f}%")
Section 11.3: Factor Attribution
Factor attribution uses regression to decompose returns by systematic factor exposures.
In this section, you will learn: - Regression-based factor attribution - Alpha and beta decomposition - Factor contribution analysis
# Create factor proxies
factor_tickers = ['IWM', 'IWF', 'IWD'] # Small cap, Growth, Value
factor_data = yf.download(factor_tickers + ['SPY'], start='2020-01-01', end='2024-01-01', progress=False)
if isinstance(factor_data.columns, pd.MultiIndex):
factor_prices = factor_data['Adj Close'] if 'Adj Close' in factor_data.columns.get_level_values(0) else factor_data['Close']
else:
factor_prices = factor_data
factor_returns = factor_prices.pct_change().dropna()
# Create factor returns (excess over market)
factors_df = pd.DataFrame({
'Market': factor_returns['SPY'],
'SmallCap': factor_returns['IWM'] - factor_returns['SPY'],
'Value': factor_returns['IWD'] - factor_returns['IWF']
})
print("Factor Statistics:")
print(factors_df.describe().round(4))
def factor_attribution(portfolio_returns: pd.Series, factors: pd.DataFrame) -> dict:
"""
Perform factor attribution using OLS regression.
Returns:
Dictionary with alpha, betas, and factor contributions
"""
# Align data
common_idx = portfolio_returns.index.intersection(factors.index)
y = portfolio_returns.loc[common_idx]
X = factors.loc[common_idx]
# Add constant for alpha
X_const = np.column_stack([np.ones(len(X)), X.values])
# OLS regression
coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]
alpha = coeffs[0]
betas = dict(zip(X.columns, coeffs[1:]))
# Factor contributions
factor_contrib = {f: betas[f] * X[f].mean() * 252 for f in betas}
# R-squared
y_pred = X_const @ coeffs
ss_res = np.sum((y.values - y_pred)**2)
ss_tot = np.sum((y.values - np.mean(y.values))**2)
r_squared = 1 - ss_res / ss_tot
return {
'alpha': alpha * 252,
'betas': betas,
'factor_contributions': factor_contrib,
'r_squared': r_squared,
'total_return': y.mean() * 252
}
# Run factor attribution
common_dates = portfolio_returns.index.intersection(factors_df.index)
port_aligned = portfolio_returns.loc[common_dates]
factor_attr = factor_attribution(port_aligned, factors_df)
print("Factor Attribution Results")
print("="*50)
print(f"Alpha (annualized): {factor_attr['alpha']*100:.2f}%")
print(f"R-squared: {factor_attr['r_squared']*100:.1f}%")
print(f"\nFactor Betas:")
for factor, beta in factor_attr['betas'].items():
print(f" {factor}: {beta:.3f}")
print(f"\nFactor Contributions (annualized):")
for factor, contrib in factor_attr['factor_contributions'].items():
print(f" {factor}: {contrib*100:.2f}%")
Exercise 11.3: Rolling Factor Attribution (Guided)
Your Task: Calculate rolling factor exposures to track style drift over time.
Fill in the blanks to complete the rolling analysis:
Click to reveal solution
def rolling_factor_betas(portfolio_returns: pd.Series, factors: pd.DataFrame,
window: int = 60) -> pd.DataFrame:
"""
Calculate rolling factor betas.
"""
common_idx = portfolio_returns.index.intersection(factors.index)
y = portfolio_returns.loc[common_idx]
X = factors.loc[common_idx]
results = []
# Loop through dates starting from window-1 index
for i in range(window - 1, len(y)):
# Get window of data
y_window = y.iloc[i - window + 1:i + 1]
X_window = X.iloc[i - window + 1:i + 1]
# Add constant and run regression
X_const = np.column_stack([np.ones(len(X_window)), X_window.values])
coeffs = np.linalg.lstsq(X_const, y_window.values, rcond=None)[0]
row = {'Date': y.index[i], 'Alpha': coeffs[0] * 252}
for j, col in enumerate(X.columns):
row[col] = coeffs[j + 1]
results.append(row)
return pd.DataFrame(results).set_index('Date')
# Test
rolling_betas = rolling_factor_betas(port_aligned, factors_df, window=60)
print(f"Average Market Beta: {rolling_betas['Market'].mean():.2f}")
print(f"Market Beta Range: {rolling_betas['Market'].min():.2f} to {rolling_betas['Market'].max():.2f}")
print(f"Average Alpha: {rolling_betas['Alpha'].mean()*100:.2f}%")
Section 11.4: Risk Attribution
Risk attribution decomposes portfolio risk into contributions from each position.
In this section, you will learn: - Marginal and component contribution to risk - Risk budgeting analysis - Diversification measurement
def risk_attribution(weights: np.ndarray, cov_matrix: np.ndarray) -> dict:
"""
Calculate risk attribution metrics.
MCR = (Sigma @ w) / sigma_p
CCR = w * MCR
PCR = CCR / sigma_p
"""
port_var = weights @ cov_matrix @ weights
port_vol = np.sqrt(port_var)
# Marginal contribution
mcr = (cov_matrix @ weights) / port_vol
# Component contribution
ccr = weights * mcr
# Percentage contribution
pcr = ccr / port_vol
return {
'portfolio_vol': port_vol,
'marginal_risk': mcr,
'component_risk': ccr,
'percent_risk': pcr
}
# Calculate risk attribution
cov_matrix = sector_returns.cov() * 252
weights_array = np.array([portfolio_weights[t] for t in sector_tickers])
risk_attr = risk_attribution(weights_array, cov_matrix.values)
print(f"Portfolio Volatility: {risk_attr['portfolio_vol']*100:.2f}%")
print(f"\nRisk Contribution by Sector:")
for i, ticker in enumerate(sector_tickers):
print(f" {sector_etfs[ticker]:18}: Weight {weights_array[i]*100:5.1f}% -> Risk {risk_attr['percent_risk'][i]*100:5.1f}%")
Exercise 11.4: Complete Attribution System (Open-ended)
Your Task:
Build a function that: - Combines Brinson and factor attribution - Includes risk contribution analysis - Returns a comprehensive attribution report
Your implementation:
Click to reveal solution
def comprehensive_attribution(portfolio_returns: pd.Series, benchmark_returns: pd.Series,
sector_returns: pd.DataFrame, factors: pd.DataFrame,
portfolio_weights: dict, benchmark_weights: dict) -> dict:
"""
Comprehensive attribution combining multiple methods.
"""
# Active return analysis
active = portfolio_returns - benchmark_returns
active_metrics = {
'active_return': active.mean() * 252,
'tracking_error': active.std() * np.sqrt(252),
'information_ratio': (active.mean() * 252) / (active.std() * np.sqrt(252))
}
# Factor attribution
common_idx = portfolio_returns.index.intersection(factors.index)
y = portfolio_returns.loc[common_idx]
X = factors.loc[common_idx]
X_const = np.column_stack([np.ones(len(X)), X.values])
coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]
factor_metrics = {
'alpha': coeffs[0] * 252,
'betas': dict(zip(X.columns, coeffs[1:]))
}
# Risk attribution
sector_tickers = list(portfolio_weights.keys())
cov_matrix = sector_returns[sector_tickers].cov() * 252
weights = np.array([portfolio_weights[t] for t in sector_tickers])
port_vol = np.sqrt(weights @ cov_matrix.values @ weights)
mcr = (cov_matrix.values @ weights) / port_vol
pcr = (weights * mcr) / port_vol
risk_metrics = {
'portfolio_vol': port_vol,
'risk_contributions': dict(zip(sector_tickers, pcr))
}
return {
'active_metrics': active_metrics,
'factor_metrics': factor_metrics,
'risk_metrics': risk_metrics
}
# Test
report = comprehensive_attribution(
portfolio_returns, benchmark_returns, sector_returns, factors_df,
portfolio_weights, benchmark_weights
)
print("COMPREHENSIVE ATTRIBUTION REPORT")
print("="*50)
print(f"\nACTIVE RETURN ANALYSIS")
print(f" Active Return: {report['active_metrics']['active_return']*100:.2f}%")
print(f" Tracking Error: {report['active_metrics']['tracking_error']*100:.2f}%")
print(f" Information Ratio: {report['active_metrics']['information_ratio']:.2f}")
print(f"\nFACTOR ATTRIBUTION")
print(f" Alpha: {report['factor_metrics']['alpha']*100:.2f}%")
for f, b in report['factor_metrics']['betas'].items():
print(f" {f} Beta: {b:.3f}")
print(f"\nRISK ATTRIBUTION")
print(f" Portfolio Vol: {report['risk_metrics']['portfolio_vol']*100:.2f}%")
Exercise 11.5: Attribution Visualization (Open-ended)
Your Task:
Build a function that creates professional attribution visualizations: - Waterfall chart for Brinson effects - Bar chart comparing weight vs risk contribution - Time series of rolling attribution
Your implementation:
Click to reveal solution
def plot_attribution_dashboard(attribution_df: pd.DataFrame,
risk_attr: dict, sector_names: dict) -> None:
"""
Create comprehensive attribution visualizations.
"""
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Brinson waterfall
ax1 = axes[0, 0]
components = ['Allocation', 'Selection', 'Interaction']
values = [attribution_df[c].sum() * 100 for c in components]
colors = ['green' if v > 0 else 'red' for v in values]
ax1.bar(components, values, color=colors, alpha=0.7)
ax1.axhline(0, color='black', linewidth=0.5)
ax1.set_ylabel('Contribution (%)')
ax1.set_title('Brinson Attribution Components')
for i, v in enumerate(values):
ax1.text(i, v + 0.2 * np.sign(v), f'{v:.2f}%', ha='center')
# 2. Weight vs Risk by sector
ax2 = axes[0, 1]
sectors = [sector_names[t] for t in attribution_df['Sector']]
x = np.arange(len(sectors))
width = 0.35
weights = attribution_df['Active Weight'].values + 0.1 # Approximate total weight
risk_pcts = risk_attr['percent_risk']
ax2.bar(x - width/2, weights * 100, width, label='Weight', color='steelblue')
ax2.bar(x + width/2, risk_pcts * 100, width, label='Risk Contribution', color='orange')
ax2.set_xticks(x)
ax2.set_xticklabels(sectors, rotation=45, ha='right')
ax2.set_ylabel('Percentage (%)')
ax2.set_title('Weight vs Risk Contribution')
ax2.legend()
# 3. Attribution by sector
ax3 = axes[1, 0]
total_by_sector = attribution_df['Total'].values * 100
colors = ['green' if v > 0 else 'red' for v in total_by_sector]
ax3.barh(sectors, total_by_sector, color=colors, alpha=0.7)
ax3.axvline(0, color='black', linewidth=0.5)
ax3.set_xlabel('Total Attribution (%)')
ax3.set_title('Attribution by Sector')
# 4. Risk contribution pie
ax4 = axes[1, 1]
risk_sorted = sorted(zip(sectors, risk_pcts), key=lambda x: x[1], reverse=True)
labels = [s for s, _ in risk_sorted[:5]] + ['Other']
sizes = [r for _, r in risk_sorted[:5]] + [sum(r for _, r in risk_sorted[5:])]
ax4.pie(sizes, labels=labels, autopct='%1.1f%%')
ax4.set_title('Risk Contribution Distribution')
plt.tight_layout()
plt.show()
# Test
plot_attribution_dashboard(attribution_df, risk_attr, sector_etfs)
Exercise 11.6: Attribution Report Generator (Open-ended)
Your Task:
Build a class that generates a complete attribution report including: - Executive summary with key metrics - Detailed Brinson attribution by sector - Factor exposure analysis - Risk decomposition - Time series analysis
Your implementation:
Click to reveal solution
class AttributionReportGenerator:
"""
Comprehensive attribution report generator.
"""
def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series):
self.portfolio_returns = portfolio_returns
self.benchmark_returns = benchmark_returns
common_idx = portfolio_returns.index.intersection(benchmark_returns.index)
self.active_returns = portfolio_returns.loc[common_idx] - benchmark_returns.loc[common_idx]
def calculate_basic_stats(self) -> dict:
"""Calculate basic performance statistics."""
return {
'portfolio_return': self.portfolio_returns.mean() * 252,
'benchmark_return': self.benchmark_returns.mean() * 252,
'active_return': self.active_returns.mean() * 252,
'tracking_error': self.active_returns.std() * np.sqrt(252),
'information_ratio': (self.active_returns.mean() * 252) /
(self.active_returns.std() * np.sqrt(252)),
'hit_rate': (self.active_returns > 0).mean()
}
def factor_attribution(self, factors: pd.DataFrame) -> dict:
"""Factor-based attribution."""
common_idx = self.portfolio_returns.index.intersection(factors.index)
y = self.portfolio_returns.loc[common_idx]
X = factors.loc[common_idx]
X_const = np.column_stack([np.ones(len(X)), X.values])
coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]
y_pred = X_const @ coeffs
r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - y.mean())**2)
return {
'alpha': coeffs[0] * 252,
'betas': dict(zip(X.columns, coeffs[1:])),
'r_squared': r_squared
}
def generate_report(self, factors: pd.DataFrame = None) -> None:
"""Generate formatted attribution report."""
stats = self.calculate_basic_stats()
print("\n" + "="*60)
print("PERFORMANCE ATTRIBUTION REPORT")
print("="*60)
print("\n--- PERFORMANCE SUMMARY ---")
print(f"Portfolio Return: {stats['portfolio_return']*100:>8.2f}%")
print(f"Benchmark Return: {stats['benchmark_return']*100:>8.2f}%")
print(f"Active Return: {stats['active_return']*100:>8.2f}%")
print("\n--- RISK METRICS ---")
print(f"Tracking Error: {stats['tracking_error']*100:>8.2f}%")
print(f"Information Ratio: {stats['information_ratio']:>8.2f}")
print(f"Hit Rate: {stats['hit_rate']*100:>8.1f}%")
if factors is not None:
factor_stats = self.factor_attribution(factors)
print("\n--- FACTOR ATTRIBUTION ---")
print(f"Alpha: {factor_stats['alpha']*100:>8.2f}%")
print(f"R-squared: {factor_stats['r_squared']*100:>8.1f}%")
print("\nFactor Betas:")
for factor, beta in factor_stats['betas'].items():
print(f" {factor:15}: {beta:>8.3f}")
print("\n" + "="*60)
# Test
report_gen = AttributionReportGenerator(portfolio_returns, benchmark_returns)
report_gen.generate_report(factors_df)
Module Project: Attribution System
Put together everything you've learned!
Your Challenge:
Build a complete performance attribution system that includes: 1. Brinson attribution (allocation, selection, interaction) 2. Factor attribution with rolling exposures 3. Risk attribution and contribution analysis 4. Time series attribution tracking 5. Professional report generation
# YOUR CODE HERE - Module Project
Click to reveal solution
class PerformanceAttributionSystem:
"""
Complete performance attribution system.
"""
def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series):
common_idx = portfolio_returns.index.intersection(benchmark_returns.index)
self.portfolio_returns = portfolio_returns.loc[common_idx]
self.benchmark_returns = benchmark_returns.loc[common_idx]
self.active_returns = self.portfolio_returns - self.benchmark_returns
def basic_metrics(self) -> dict:
"""Calculate basic performance metrics."""
return {
'portfolio_return': self.portfolio_returns.mean() * 252,
'benchmark_return': self.benchmark_returns.mean() * 252,
'active_return': self.active_returns.mean() * 252,
'tracking_error': self.active_returns.std() * np.sqrt(252),
'information_ratio': (self.active_returns.mean() * 252) /
(self.active_returns.std() * np.sqrt(252)),
'sharpe_portfolio': (self.portfolio_returns.mean() * 252) /
(self.portfolio_returns.std() * np.sqrt(252)),
'hit_rate': (self.active_returns > 0).mean()
}
def brinson_attribution(self, portfolio_weights: dict, benchmark_weights: dict,
sector_returns: pd.DataFrame) -> pd.DataFrame:
"""Single-period Brinson attribution."""
cum_sector = (1 + sector_returns).prod() - 1
cum_bench = (1 + self.benchmark_returns).prod() - 1
results = []
for sector in portfolio_weights:
wp = portfolio_weights[sector]
wb = benchmark_weights.get(sector, 0)
rb = float(cum_sector.get(sector, 0))
rp = rb + np.random.normal(0, 0.02) # Simulated selection
allocation = (wp - wb) * (rb - cum_bench)
selection = wb * (rp - rb)
interaction = (wp - wb) * (rp - rb)
results.append({
'Sector': sector,
'Allocation': allocation,
'Selection': selection,
'Interaction': interaction,
'Total': allocation + selection + interaction
})
return pd.DataFrame(results)
def factor_attribution(self, factors: pd.DataFrame) -> dict:
"""Factor-based attribution."""
common_idx = self.portfolio_returns.index.intersection(factors.index)
y = self.portfolio_returns.loc[common_idx]
X = factors.loc[common_idx]
X_const = np.column_stack([np.ones(len(X)), X.values])
coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]
y_pred = X_const @ coeffs
r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - y.mean())**2)
return {
'alpha': coeffs[0] * 252,
'betas': dict(zip(X.columns, coeffs[1:])),
'factor_contrib': {f: coeffs[i+1] * X[f].mean() * 252
for i, f in enumerate(X.columns)},
'r_squared': r_squared
}
def risk_attribution(self, weights: dict, cov_matrix: pd.DataFrame) -> dict:
"""Risk contribution analysis."""
assets = list(weights.keys())
w = np.array([weights[a] for a in assets])
cov = cov_matrix.loc[assets, assets].values
port_vol = np.sqrt(w @ cov @ w)
mcr = (cov @ w) / port_vol
ccr = w * mcr
pcr = ccr / port_vol
return {
'portfolio_vol': port_vol,
'risk_contributions': dict(zip(assets, pcr))
}
def generate_report(self, portfolio_weights: dict = None,
benchmark_weights: dict = None,
sector_returns: pd.DataFrame = None,
factors: pd.DataFrame = None) -> None:
"""Generate comprehensive attribution report."""
print("\n" + "="*70)
print("PERFORMANCE ATTRIBUTION REPORT")
print("="*70)
# Basic metrics
metrics = self.basic_metrics()
print("\n--- PERFORMANCE SUMMARY ---")
print(f"Portfolio Return: {metrics['portfolio_return']*100:>10.2f}%")
print(f"Benchmark Return: {metrics['benchmark_return']*100:>10.2f}%")
print(f"Active Return: {metrics['active_return']*100:>10.2f}%")
print(f"Tracking Error: {metrics['tracking_error']*100:>10.2f}%")
print(f"Information Ratio: {metrics['information_ratio']:>10.2f}")
print(f"Hit Rate: {metrics['hit_rate']*100:>10.1f}%")
# Brinson attribution
if portfolio_weights and benchmark_weights and sector_returns is not None:
brinson = self.brinson_attribution(portfolio_weights, benchmark_weights, sector_returns)
print("\n--- BRINSON ATTRIBUTION ---")
print(f"Allocation Effect: {brinson['Allocation'].sum()*100:>10.2f}%")
print(f"Selection Effect: {brinson['Selection'].sum()*100:>10.2f}%")
print(f"Interaction Effect: {brinson['Interaction'].sum()*100:>10.2f}%")
# Factor attribution
if factors is not None:
factor_attr = self.factor_attribution(factors)
print("\n--- FACTOR ATTRIBUTION ---")
print(f"Alpha: {factor_attr['alpha']*100:>10.2f}%")
print(f"R-squared: {factor_attr['r_squared']*100:>10.1f}%")
print("\nFactor Exposures:")
for f, b in factor_attr['betas'].items():
print(f" {f:15}: {b:>10.3f}")
print("\n" + "="*70)
# Demo
system = PerformanceAttributionSystem(portfolio_returns, benchmark_returns)
system.generate_report(
portfolio_weights=portfolio_weights,
benchmark_weights=benchmark_weights,
sector_returns=sector_returns,
factors=factors_df
)
Key Takeaways
What You Learned
1. Attribution Basics
- Active return = portfolio return - benchmark return
- Information Ratio measures risk-adjusted active return
- Hit rate and win/loss ratio indicate consistency
2. Brinson Attribution
- Allocation effect: Value from sector weighting
- Selection effect: Value from security selection
- Interaction effect: Combined allocation and selection
3. Factor Attribution
- Regression decomposes returns by factor exposure
- Alpha is factor-adjusted excess return
- R-squared shows explanatory power of factors
4. Risk Attribution
- Marginal contribution measures sensitivity to weight changes
- Component contributions sum to total volatility
- Identifies concentration risk
Coming Up Next
In Module 12: Building Dashboards, we'll explore: - Dashboard design principles - Plotly and Dash fundamentals - Interactive financial visualizations - Real-time updates with callbacks
Congratulations on completing Module 11!
Module 12: Building Dashboards
Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics
Learning Objectives
By the end of this module, you will be able to:
- Design effective financial dashboards following best practices
- Create interactive visualizations using Plotly
- Build multi-component dashboard layouts
- Understand Dash callback architecture for interactivity
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 4-8: Portfolio Theory & Risk |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
print('Module 12: Building Dashboards - Ready!')
print('\nNote: Full Dash apps require running outside Jupyter.')
print('This notebook demonstrates Plotly visualizations and Dash concepts.')
Load Data
# Download data for visualization examples
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)
if isinstance(data.columns, pd.MultiIndex):
prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
prices = data
returns = prices.pct_change().dropna()
print(f'Data loaded: {len(returns)} trading days')
Section 12.1: Dashboard Design Principles
Effective dashboards follow clear design principles that maximize information transfer.
In this section, you will learn: - Information hierarchy in dashboard design - Color conventions for financial data - Layout best practices
12.1.1 Information Hierarchy
Financial dashboards should follow a clear structure:
- Summary Metrics (top): Key numbers at a glance
- Trend Charts (middle): Performance over time
- Detailed Analysis (bottom): Drill-down capabilities
| Principle | Application |
|---|---|
| Clarity | Show the most important metrics prominently |
| Context | Always include benchmarks for comparison |
| Consistency | Use consistent colors for gains (green) and losses (red) |
| Actionability | Highlight items requiring attention |
| Simplicity | Avoid chart junk and unnecessary complexity |
Section 12.2: Plotly Fundamentals
Plotly creates interactive visualizations that work in Jupyter and can be embedded in Dash apps.
In this section, you will learn: - Creating figures with go.Figure - Adding multiple traces - Customizing layouts and styling
# Basic line chart with Plotly
fig = go.Figure()
# Normalize prices to 100
normalized = prices / prices.iloc[0] * 100
for col in normalized.columns:
fig.add_trace(go.Scatter(
x=normalized.index,
y=normalized[col],
name=col,
mode='lines'
))
fig.update_layout(
title='Asset Performance (Normalized to 100)',
xaxis_title='Date',
yaxis_title='Value',
template='plotly_white',
hovermode='x unified',
legend=dict(orientation='h', yanchor='bottom', y=1.02)
)
fig.show()
12.2.1 Risk-Return Scatter Plot
# Calculate metrics
ann_returns = returns.mean() * 252
ann_vol = returns.std() * np.sqrt(252)
sharpe = ann_returns / ann_vol
# Create scatter plot
fig = go.Figure()
fig.add_trace(go.Scatter(
x=ann_vol * 100,
y=ann_returns * 100,
mode='markers+text',
marker=dict(
size=sharpe * 30 + 20,
color=sharpe,
colorscale='RdYlGn',
colorbar=dict(title='Sharpe'),
line=dict(width=2, color='white')
),
text=returns.columns,
textposition='top center',
hovertemplate=(
'<b>%{text}</b><br>' +
'Return: %{y:.1f}%<br>' +
'Volatility: %{x:.1f}%<br>' +
'<extra></extra>'
)
))
fig.update_layout(
title='Risk-Return Profile',
xaxis_title='Volatility (%)',
yaxis_title='Return (%)',
template='plotly_white',
showlegend=False
)
fig.show()
Exercise 12.1: Correlation Heatmap (Guided)
Your Task: Create an interactive correlation heatmap using Plotly.
Fill in the blanks to complete the heatmap:
Click to reveal solution
def create_correlation_heatmap(returns: pd.DataFrame) -> go.Figure:
"""
Create an interactive correlation heatmap.
"""
# Calculate correlation matrix
corr_matrix = returns.corr()
# Create heatmap
fig = go.Figure(
data=go.Heatmap(
z=corr_matrix.values,
x=corr_matrix.columns,
y=corr_matrix.index,
colorscale='RdYlGn',
zmid=0,
text=np.round(corr_matrix.values, 2),
texttemplate='%{text}',
hovertemplate='%{x} vs %{y}<br>Correlation: %{z:.3f}<extra></extra>'
)
)
fig.update_layout(
title='Asset Correlation Matrix',
template='plotly_white',
width=500,
height=500
)
return fig
# Test
heatmap = create_correlation_heatmap(returns)
heatmap.show()
Section 12.3: Financial Charts
In this section, you will learn: - Candlestick charts with technical indicators - Multi-panel layouts with subplots - KPI indicator cards
# Download OHLC data for candlestick chart
spy_ohlc = yf.download('SPY', start='2023-10-01', end='2024-01-01', progress=False)
if isinstance(spy_ohlc.columns, pd.MultiIndex):
spy_ohlc.columns = spy_ohlc.columns.get_level_values(0)
# Create candlestick with volume
fig = make_subplots(
rows=2, cols=1,
shared_xaxes=True,
vertical_spacing=0.03,
row_heights=[0.7, 0.3]
)
# Candlestick chart
fig.add_trace(
go.Candlestick(
x=spy_ohlc.index,
open=spy_ohlc['Open'],
high=spy_ohlc['High'],
low=spy_ohlc['Low'],
close=spy_ohlc['Close'],
name='SPY'
),
row=1, col=1
)
# Add moving averages
spy_ohlc['MA20'] = spy_ohlc['Close'].rolling(20).mean()
spy_ohlc['MA50'] = spy_ohlc['Close'].rolling(50).mean()
fig.add_trace(
go.Scatter(x=spy_ohlc.index, y=spy_ohlc['MA20'], name='MA20', line=dict(color='orange', width=1)),
row=1, col=1
)
# Volume bars
colors = ['red' if spy_ohlc['Close'].iloc[i] < spy_ohlc['Open'].iloc[i] else 'green'
for i in range(len(spy_ohlc))]
fig.add_trace(
go.Bar(x=spy_ohlc.index, y=spy_ohlc['Volume'], name='Volume', marker_color=colors),
row=2, col=1
)
fig.update_layout(
title='SPY Price Chart with Volume',
xaxis_rangeslider_visible=False,
template='plotly_white',
height=600
)
fig.show()
12.3.1 KPI Cards
def create_kpi_cards(metrics: dict) -> go.Figure:
"""
Create KPI indicator cards.
Args:
metrics: Dict of {name: {'value': x, 'reference': y, 'suffix': '%'}}
"""
n_metrics = len(metrics)
fig = make_subplots(
rows=1, cols=n_metrics,
specs=[[{'type': 'indicator'}] * n_metrics]
)
for i, (name, data) in enumerate(metrics.items()):
fig.add_trace(
go.Indicator(
mode='number+delta',
value=data['value'],
title={'text': name, 'font': {'size': 14}},
delta={'reference': data.get('reference', 0),
'relative': data.get('relative', False)},
number={'suffix': data.get('suffix', ''),
'font': {'size': 24}}
),
row=1, col=i+1
)
fig.update_layout(height=200, template='plotly_white')
return fig
# Calculate metrics
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
w = np.array([weights[t] for t in prices.columns])
port_ret = (returns * w).sum(axis=1)
metrics = {
'YTD Return': {
'value': ((1 + port_ret).prod() - 1) * 100,
'reference': ((1 + returns['SPY']).prod() - 1) * 100,
'suffix': '%'
},
'Volatility': {
'value': port_ret.std() * np.sqrt(252) * 100,
'reference': returns['SPY'].std() * np.sqrt(252) * 100,
'suffix': '%'
},
'Sharpe Ratio': {
'value': (port_ret.mean() * 252) / (port_ret.std() * np.sqrt(252)),
'reference': 1.0
}
}
kpi_fig = create_kpi_cards(metrics)
kpi_fig.show()
Exercise 12.2: Portfolio Dashboard Layout (Guided)
Your Task: Create a multi-panel dashboard using Plotly subplots.
Fill in the blanks to complete the dashboard:
Click to reveal solution
def create_portfolio_dashboard(prices: pd.DataFrame, returns: pd.DataFrame,
weights: dict) -> go.Figure:
"""
Create a comprehensive portfolio dashboard.
"""
w = np.array([weights[t] for t in returns.columns])
port_returns = (returns * w).sum(axis=1)
port_cum = (1 + port_returns).cumprod()
# Create 2x2 subplot layout
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Portfolio Performance', 'Asset Allocation',
'Drawdown', 'Return Distribution'),
specs=[
[{'type': 'scatter'}, {'type': 'pie'}],
[{'type': 'scatter'}, {'type': 'histogram'}]
]
)
# Performance chart
fig.add_trace(
go.Scatter(x=port_cum.index, y=port_cum * 100, name='Portfolio',
line=dict(color='steelblue', width=2)),
row=1, col=1
)
# Pie chart for allocation
fig.add_trace(
go.Pie(labels=list(weights.keys()), values=list(weights.values()),
hole=0.4),
row=1, col=2
)
# Drawdown chart
running_max = port_cum.cummax()
drawdown = (port_cum - running_max) / running_max
fig.add_trace(
go.Scatter(x=drawdown.index, y=drawdown * 100,
fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
line=dict(color='red', width=1), name='Drawdown'),
row=2, col=1
)
# Histogram for return distribution
fig.add_trace(
go.Histogram(x=port_returns * 100, nbinsx=50,
marker_color='steelblue', name='Returns'),
row=2, col=2
)
fig.update_layout(height=700, showlegend=False, template='plotly_white')
return fig
# Test
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
dashboard = create_portfolio_dashboard(prices, returns, weights)
dashboard.show()
Exercise 12.3: Risk Dashboard (Guided)
Your Task: Create a risk monitoring dashboard with VaR, drawdown, and volatility panels.
Fill in the blanks to complete the dashboard:
Click to reveal solution
def create_risk_dashboard(returns: pd.Series, window: int = 21,
confidence: float = 0.95) -> go.Figure:
"""
Create a risk monitoring dashboard.
"""
# Calculate rolling volatility (annualized)
rolling_vol = returns.rolling(window).std() * np.sqrt(252)
# Calculate rolling VaR (using quantile)
rolling_var = returns.rolling(window).quantile(1 - confidence) * -1
# Calculate drawdown
cum_returns = (1 + returns).cumprod()
running_max = cum_returns.cummax()
drawdown = (cum_returns - running_max) / running_max
# Create 3-panel layout
fig = make_subplots(
rows=3, cols=1,
shared_xaxes=True,
subplot_titles=(
f'{window}-Day Rolling VaR ({confidence*100:.0f}%)',
'Drawdown',
f'{window}-Day Rolling Volatility'
)
)
fig.add_trace(
go.Scatter(x=rolling_var.index, y=rolling_var * 100,
fill='tozeroy', fillcolor='rgba(255,100,100,0.3)',
line=dict(color='red', width=1)),
row=1, col=1
)
fig.add_trace(
go.Scatter(x=drawdown.index, y=drawdown * 100,
fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
line=dict(color='darkred', width=1)),
row=2, col=1
)
fig.add_trace(
go.Scatter(x=rolling_vol.index, y=rolling_vol * 100,
fill='tozeroy', fillcolor='rgba(255,165,0,0.3)',
line=dict(color='orange', width=1)),
row=3, col=1
)
fig.update_layout(height=700, showlegend=False, template='plotly_white',
title='Risk Monitoring Dashboard')
return fig
# Test
port_returns = (returns * np.array([0.4, 0.25, 0.25, 0.1])).sum(axis=1)
risk_dash = create_risk_dashboard(port_returns)
risk_dash.show()
Section 12.4: Dash Architecture
Dash enables full interactivity through callbacks - functions that update outputs when inputs change.
In this section, you will learn: - Dash application structure - Callbacks for interactivity - Input/Output components
12.4.1 Dash Application Structure
from dash import Dash, html, dcc, Input, Output
app = Dash(__name__)
app.layout = html.Div([
# Input components
dcc.Dropdown(id='asset-selector', options=[...]),
# Output components
dcc.Graph(id='performance-chart')
])
@app.callback(
Output('performance-chart', 'figure'),
Input('asset-selector', 'value')
)
def update_chart(selected_asset):
# Create and return figure
return fig
if __name__ == '__main__':
app.run_server(debug=True)
# Simulating interactive behavior in Jupyter
def create_interactive_chart(assets_to_show: list, lookback_days: int) -> go.Figure:
"""
Create a chart that would be updated by Dash callbacks.
"""
filtered_prices = prices[assets_to_show].iloc[-lookback_days:]
normalized = filtered_prices / filtered_prices.iloc[0] * 100
fig = go.Figure()
for col in normalized.columns:
fig.add_trace(go.Scatter(
x=normalized.index,
y=normalized[col],
name=col,
mode='lines'
))
fig.update_layout(
title=f'Performance Over Last {lookback_days} Days',
xaxis_title='Date',
yaxis_title='Normalized Value',
template='plotly_white',
hovermode='x unified'
)
return fig
# Simulate different states
print("Simulating: All assets, 252 days")
fig1 = create_interactive_chart(['SPY', 'QQQ', 'TLT', 'GLD'], 252)
fig1.show()
Exercise 12.4: Monthly Returns Heatmap (Open-ended)
Your Task:
Build a function that: - Creates a calendar heatmap of monthly returns - Shows months as columns and years as rows - Uses diverging color scale centered on zero - Includes hover information
Your implementation:
Click to reveal solution
def create_monthly_returns_heatmap(returns: pd.Series, title: str = 'Monthly Returns') -> go.Figure:
"""
Create a calendar heatmap of monthly returns.
"""
# Calculate monthly returns
monthly = returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
# Create pivot table
monthly_df = pd.DataFrame({
'Year': monthly.index.year,
'Month': monthly.index.month,
'Return': monthly.values
})
pivot = monthly_df.pivot(index='Year', columns='Month', values='Return')
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
fig = go.Figure(data=go.Heatmap(
z=pivot.values * 100,
x=months[:pivot.shape[1]],
y=pivot.index,
colorscale='RdYlGn',
zmid=0,
text=np.round(pivot.values * 100, 1),
texttemplate='%{text}%',
hovertemplate='%{y} %{x}<br>Return: %{z:.1f}%<extra></extra>',
colorbar=dict(title='Return (%)')
))
fig.update_layout(
title=title,
template='plotly_white',
height=300
)
return fig
# Test
port_returns = (returns * np.array([0.4, 0.25, 0.25, 0.1])).sum(axis=1)
heatmap = create_monthly_returns_heatmap(port_returns, 'Portfolio Monthly Returns')
heatmap.show()
Exercise 12.5: Performance Comparison Chart (Open-ended)
Your Task:
Build a function that: - Compares multiple portfolios against a benchmark - Shows cumulative returns over time - Includes a summary table with key metrics - Uses appropriate color coding
Your implementation:
Click to reveal solution
def create_performance_comparison(returns: pd.DataFrame, portfolios: dict,
benchmark: str = 'SPY') -> go.Figure:
"""
Create a performance comparison chart.
Args:
portfolios: Dict of {name: {ticker: weight}}
benchmark: Benchmark ticker
"""
# Calculate portfolio returns
port_returns = {}
for name, weights in portfolios.items():
w = np.array([weights.get(t, 0) for t in returns.columns])
port_returns[name] = (returns * w).sum(axis=1)
# Add benchmark
port_returns['Benchmark'] = returns[benchmark]
# Calculate cumulative returns
cum_returns = {name: (1 + ret).cumprod() for name, ret in port_returns.items()}
# Create figure
colors = ['steelblue', 'orange', 'green', 'red', 'purple', 'gray']
fig = go.Figure()
for i, (name, cum_ret) in enumerate(cum_returns.items()):
line_style = dict(dash='dash') if name == 'Benchmark' else dict()
fig.add_trace(go.Scatter(
x=cum_ret.index,
y=(cum_ret - 1) * 100,
name=name,
mode='lines',
line=dict(color=colors[i % len(colors)], **line_style)
))
# Add metrics annotation
metrics_text = "<b>Annualized Metrics:</b><br>"
for name, ret in port_returns.items():
ann_ret = ret.mean() * 252 * 100
sharpe = (ret.mean() * 252) / (ret.std() * np.sqrt(252))
metrics_text += f"{name}: {ann_ret:.1f}% (SR: {sharpe:.2f})<br>"
fig.add_annotation(
x=0.02, y=0.98, xref='paper', yref='paper',
text=metrics_text, showarrow=False,
font=dict(size=10), align='left',
bgcolor='white', bordercolor='gray', borderwidth=1
)
fig.update_layout(
title='Portfolio Performance Comparison',
xaxis_title='Date',
yaxis_title='Cumulative Return (%)',
template='plotly_white',
hovermode='x unified',
legend=dict(orientation='h', y=1.1)
)
return fig
# Test
portfolios = {
'Aggressive': {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0},
'Balanced': {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1},
'Defensive': {'SPY': 0.2, 'QQQ': 0.0, 'TLT': 0.5, 'GLD': 0.3}
}
comparison = create_performance_comparison(returns, portfolios)
comparison.show()
Exercise 12.6: Complete Dashboard Class (Open-ended)
Your Task:
Build a comprehensive dashboard class that includes: - Performance visualization methods - Risk charts (drawdown, VaR, volatility) - Allocation views (pie, bar) - Full dashboard layout generation - Dark/light theme support
Your implementation:
Click to reveal solution
class QuantDashboard:
"""
Complete quantitative finance dashboard builder.
"""
def __init__(self, returns: pd.DataFrame, prices: pd.DataFrame = None,
benchmark: str = None):
self.returns = returns
self.prices = prices
self.benchmark = benchmark
self.template = 'plotly_white'
def set_dark_theme(self):
"""Switch to dark theme."""
self.template = 'plotly_dark'
def performance_chart(self, normalize: bool = True) -> go.Figure:
"""Create performance time series chart."""
if self.prices is not None:
data = self.prices
else:
data = (1 + self.returns).cumprod()
if normalize:
data = data / data.iloc[0] * 100
fig = go.Figure()
for col in data.columns:
fig.add_trace(go.Scatter(x=data.index, y=data[col], name=col, mode='lines'))
fig.update_layout(
title='Performance Over Time',
xaxis_title='Date',
yaxis_title='Normalized Value' if normalize else 'Value',
template=self.template,
hovermode='x unified'
)
return fig
def drawdown_chart(self) -> go.Figure:
"""Create drawdown chart."""
if isinstance(self.returns, pd.DataFrame):
ret = self.returns.mean(axis=1)
else:
ret = self.returns
cum = (1 + ret).cumprod()
running_max = cum.cummax()
drawdown = (cum - running_max) / running_max
fig = go.Figure(go.Scatter(
x=drawdown.index, y=drawdown * 100,
fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
line=dict(color='red', width=1)
))
fig.update_layout(
title='Drawdown',
xaxis_title='Date',
yaxis_title='Drawdown (%)',
template=self.template
)
return fig
def correlation_heatmap(self) -> go.Figure:
"""Create correlation heatmap."""
corr = self.returns.corr()
fig = go.Figure(go.Heatmap(
z=corr.values, x=corr.columns, y=corr.index,
colorscale='RdYlGn', zmid=0,
text=np.round(corr.values, 2), texttemplate='%{text}'
))
fig.update_layout(title='Correlation Matrix', template=self.template)
return fig
def risk_return_scatter(self) -> go.Figure:
"""Create risk-return scatter plot."""
ann_ret = self.returns.mean() * 252
ann_vol = self.returns.std() * np.sqrt(252)
sharpe = ann_ret / ann_vol
fig = go.Figure(go.Scatter(
x=ann_vol * 100, y=ann_ret * 100,
mode='markers+text',
text=self.returns.columns,
textposition='top center',
marker=dict(size=sharpe * 20 + 15, color=sharpe, colorscale='RdYlGn')
))
fig.update_layout(
title='Risk-Return Profile',
xaxis_title='Volatility (%)',
yaxis_title='Return (%)',
template=self.template
)
return fig
def full_dashboard(self) -> go.Figure:
"""Create comprehensive dashboard with all components."""
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Performance', 'Risk-Return', 'Drawdown', 'Correlation'),
specs=[
[{'type': 'scatter'}, {'type': 'scatter'}],
[{'type': 'scatter'}, {'type': 'heatmap'}]
]
)
# Get individual figures
perf = self.performance_chart()
rr = self.risk_return_scatter()
dd = self.drawdown_chart()
corr = self.correlation_heatmap()
# Add traces
for trace in perf.data:
fig.add_trace(trace, row=1, col=1)
for trace in rr.data:
fig.add_trace(trace, row=1, col=2)
for trace in dd.data:
fig.add_trace(trace, row=2, col=1)
for trace in corr.data:
fig.add_trace(trace, row=2, col=2)
fig.update_layout(
height=800, showlegend=False, template=self.template,
title='Quantitative Finance Dashboard'
)
return fig
# Demo
dashboard = QuantDashboard(returns, prices)
full_dash = dashboard.full_dashboard()
full_dash.show()
Module Project: Portfolio Analytics Dashboard
Put together everything you've learned!
Your Challenge:
Build a complete portfolio analytics dashboard that includes: 1. KPI cards showing key metrics (return, volatility, Sharpe) 2. Performance chart with benchmark comparison 3. Asset allocation visualization (pie chart) 4. Risk metrics panel (drawdown, VaR, volatility) 5. Monthly returns heatmap 6. Correlation matrix
# YOUR CODE HERE - Module Project
Click to reveal solution
class PortfolioAnalyticsDashboard:
"""
Complete portfolio analytics dashboard.
"""
def __init__(self, returns: pd.DataFrame, weights: dict, benchmark: str = 'SPY'):
self.returns = returns
self.weights = weights
self.benchmark = benchmark
self.template = 'plotly_white'
# Calculate portfolio returns
w = np.array([weights.get(t, 0) for t in returns.columns])
self.port_returns = (returns * w).sum(axis=1)
self.bench_returns = returns[benchmark]
def _calculate_metrics(self) -> dict:
"""Calculate key portfolio metrics."""
return {
'total_return': (1 + self.port_returns).prod() - 1,
'ann_return': self.port_returns.mean() * 252,
'ann_vol': self.port_returns.std() * np.sqrt(252),
'sharpe': (self.port_returns.mean() * 252) / (self.port_returns.std() * np.sqrt(252)),
'max_dd': ((1 + self.port_returns).cumprod() /
(1 + self.port_returns).cumprod().cummax() - 1).min()
}
def generate_dashboard(self) -> go.Figure:
"""Generate complete dashboard."""
metrics = self._calculate_metrics()
fig = make_subplots(
rows=3, cols=3,
subplot_titles=[
'Total Return', 'Volatility', 'Sharpe Ratio',
'Performance vs Benchmark', 'Asset Allocation', 'Drawdown',
'Monthly Returns', 'Correlation Matrix', 'Risk-Return'
],
specs=[
[{'type': 'indicator'}, {'type': 'indicator'}, {'type': 'indicator'}],
[{'type': 'scatter'}, {'type': 'pie'}, {'type': 'scatter'}],
[{'type': 'heatmap'}, {'type': 'heatmap'}, {'type': 'scatter'}]
],
vertical_spacing=0.1,
horizontal_spacing=0.1
)
# KPI Cards
fig.add_trace(go.Indicator(
mode='number', value=metrics['total_return'] * 100,
number={'suffix': '%', 'font': {'size': 24}}
), row=1, col=1)
fig.add_trace(go.Indicator(
mode='number', value=metrics['ann_vol'] * 100,
number={'suffix': '%', 'font': {'size': 24}}
), row=1, col=2)
fig.add_trace(go.Indicator(
mode='number', value=metrics['sharpe'],
number={'font': {'size': 24}}
), row=1, col=3)
# Performance
port_cum = (1 + self.port_returns).cumprod()
bench_cum = (1 + self.bench_returns).cumprod()
fig.add_trace(go.Scatter(x=port_cum.index, y=port_cum, name='Portfolio',
line=dict(color='steelblue')), row=2, col=1)
fig.add_trace(go.Scatter(x=bench_cum.index, y=bench_cum, name='Benchmark',
line=dict(color='gray', dash='dash')), row=2, col=1)
# Allocation
active_weights = {k: v for k, v in self.weights.items() if v > 0}
fig.add_trace(go.Pie(labels=list(active_weights.keys()),
values=list(active_weights.values()), hole=0.4), row=2, col=2)
# Drawdown
drawdown = (port_cum / port_cum.cummax() - 1) * 100
fig.add_trace(go.Scatter(x=drawdown.index, y=drawdown,
fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
line=dict(color='red')), row=2, col=3)
# Monthly returns heatmap
monthly = self.port_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
monthly_df = pd.DataFrame({'Year': monthly.index.year, 'Month': monthly.index.month,
'Return': monthly.values})
pivot = monthly_df.pivot(index='Year', columns='Month', values='Return')
fig.add_trace(go.Heatmap(z=pivot.values * 100, x=list(range(1, 13)), y=pivot.index,
colorscale='RdYlGn', zmid=0), row=3, col=1)
# Correlation
corr = self.returns.corr()
fig.add_trace(go.Heatmap(z=corr.values, x=corr.columns, y=corr.index,
colorscale='RdYlGn', zmid=0), row=3, col=2)
# Risk-Return
ann_ret = self.returns.mean() * 252
ann_vol = self.returns.std() * np.sqrt(252)
fig.add_trace(go.Scatter(
x=ann_vol * 100, y=ann_ret * 100,
mode='markers+text', text=self.returns.columns,
textposition='top center',
marker=dict(size=15, color=ann_ret / ann_vol, colorscale='RdYlGn')
), row=3, col=3)
fig.update_layout(
height=1000, template=self.template, showlegend=False,
title=f'Portfolio Analytics Dashboard | Return: {metrics["ann_return"]*100:.1f}% | Sharpe: {metrics["sharpe"]:.2f}'
)
return fig
# Demo
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
dashboard = PortfolioAnalyticsDashboard(returns, weights)
full_dash = dashboard.generate_dashboard()
full_dash.show()
Key Takeaways
What You Learned
1. Dashboard Design
- Follow information hierarchy: summary -> trends -> details
- Use consistent color conventions (green=good, red=bad)
- Keep it simple and actionable
2. Plotly Fundamentals
- go.Figure() for custom charts
- make_subplots() for multi-panel layouts
- Templates for consistent styling
3. Financial Charts
- Candlestick charts for price data
- Heatmaps for correlations and calendar views
- Scatter plots for risk-return analysis
4. Dash Architecture
- Callbacks connect inputs to outputs
- Layout defines component structure
- Full interactivity without page refresh
Coming Up Next
In Module 13: Professional Reporting, we'll explore: - Automated report generation - PDF and Excel output - Performance tear sheets - Scheduled report delivery
Congratulations on completing Module 12!
Module 13: Professional Reporting
Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure
Learning Objectives
By the end of this module, you will be able to:
- Design professional financial reports for different audiences
- Automate PDF report generation with ReportLab
- Create formatted Excel workbooks with openpyxl
- Build performance tear sheets for strategy evaluation
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 11, 12 |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import io
import os
import warnings
warnings.filterwarnings('ignore')
# PDF generation
try:
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter, A4
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
PDF_AVAILABLE = True
except ImportError:
PDF_AVAILABLE = False
print("ReportLab not installed. PDF generation will be simulated.")
# Excel generation
try:
import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.chart import LineChart, Reference, BarChart
EXCEL_AVAILABLE = True
except ImportError:
EXCEL_AVAILABLE = False
print("openpyxl not installed. Excel generation will be simulated.")
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Module 13: Professional Reporting - Ready!')
Load Data
# Download data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)
if isinstance(data.columns, pd.MultiIndex):
prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
prices = data
returns = prices.pct_change().dropna()
# Create sample portfolio
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
w = np.array([weights[t] for t in tickers])
portfolio_returns = (returns[tickers] * w).sum(axis=1)
benchmark_returns = returns['SPY']
print(f'Data loaded: {len(returns)} trading days')
Section 13.1: Report Design Principles
Professional portfolio management requires clear, consistent reporting. Different audiences need different information at varying levels of detail.
In this section, you will learn: - Types of financial reports - Report structure best practices - Calculating report metrics
13.1.1 Types of Financial Reports
| Report Type | Audience | Frequency | Content |
|---|---|---|---|
| Client Report | External | Monthly/Quarterly | Performance, holdings, commentary |
| Risk Report | Internal | Daily | VaR, limits, breaches |
| Regulatory Report | Regulators | Quarterly/Annual | Compliance, positions |
| Tear Sheet | Marketing | On-demand | Strategy summary |
13.1.2 Report Structure Best Practices
- Executive Summary: Key numbers at a glance
- Performance Section: Returns, attribution, benchmarks
- Risk Section: Volatility, VaR, drawdowns
- Holdings Section: Current positions, weights
- Appendix: Methodology, disclaimers
13.1.3 Calculating Report Metrics
def calculate_report_metrics(returns: pd.Series, benchmark_returns: pd.Series,
risk_free_rate: float = 0.0) -> dict:
"""
Calculate all metrics needed for a performance report.
Parameters:
-----------
returns : Series
Portfolio returns
benchmark_returns : Series
Benchmark returns
risk_free_rate : float
Annual risk-free rate
Returns:
--------
dict : Report metrics
"""
# Cumulative returns
cum_port = (1 + returns).cumprod()
cum_bench = (1 + benchmark_returns).cumprod()
# Period returns
total_return = cum_port.iloc[-1] - 1
bench_return = cum_bench.iloc[-1] - 1
# Annualized metrics
n_years = len(returns) / 252
ann_return = (1 + total_return) ** (1/n_years) - 1
ann_vol = returns.std() * np.sqrt(252)
# Risk-adjusted metrics
sharpe = (ann_return - risk_free_rate) / ann_vol
# Sortino
downside_returns = returns[returns < 0]
downside_vol = downside_returns.std() * np.sqrt(252)
sortino = (ann_return - risk_free_rate) / downside_vol
# Drawdown
running_max = cum_port.cummax()
drawdown = (cum_port - running_max) / running_max
max_dd = drawdown.min()
# Calmar
calmar = ann_return / abs(max_dd) if max_dd != 0 else 0
# Relative metrics
active_returns = returns - benchmark_returns
tracking_error = active_returns.std() * np.sqrt(252)
information_ratio = (active_returns.mean() * 252) / tracking_error
# Beta and Alpha
cov = np.cov(returns, benchmark_returns)[0, 1]
var_bench = benchmark_returns.var()
beta = cov / var_bench
alpha = ann_return - beta * (benchmark_returns.mean() * 252)
# VaR and ES
var_95 = -np.percentile(returns, 5)
es_95 = -returns[returns <= np.percentile(returns, 5)].mean()
# Win rate
win_rate = (returns > 0).mean()
return {
'total_return': total_return,
'benchmark_return': bench_return,
'active_return': total_return - bench_return,
'ann_return': ann_return,
'ann_volatility': ann_vol,
'sharpe_ratio': sharpe,
'sortino_ratio': sortino,
'calmar_ratio': calmar,
'max_drawdown': max_dd,
'tracking_error': tracking_error,
'information_ratio': information_ratio,
'beta': beta,
'alpha': alpha,
'var_95': var_95,
'es_95': es_95,
'win_rate': win_rate,
'n_periods': len(returns),
'start_date': returns.index[0],
'end_date': returns.index[-1]
}
# Calculate metrics
metrics = calculate_report_metrics(portfolio_returns, benchmark_returns)
print("Portfolio Performance Metrics")
print("=" * 50)
print(f"\nPeriod: {metrics['start_date'].strftime('%Y-%m-%d')} to {metrics['end_date'].strftime('%Y-%m-%d')}")
print(f"\nReturn Metrics:")
print(f" Total Return: {metrics['total_return']*100:.2f}%")
print(f" Benchmark Return: {metrics['benchmark_return']*100:.2f}%")
print(f" Annualized: {metrics['ann_return']*100:.2f}%")
Exercise 13.1: Report Metrics Calculator (Guided)
Your Task: Complete the function to calculate rolling performance metrics for a report.
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_rolling_metrics(returns: pd.Series, window: int = 252) -> pd.DataFrame:
"""
Calculate rolling performance metrics for reporting.
"""
# Calculate rolling mean return (annualized)
rolling_return = returns.rolling(window).mean() * 252
# Calculate rolling volatility (annualized)
rolling_vol = returns.rolling(window).std() * np.sqrt(252)
# Calculate rolling Sharpe ratio
rolling_sharpe = rolling_return / rolling_vol
return pd.DataFrame({
'return': rolling_return,
'volatility': rolling_vol,
'sharpe': rolling_sharpe
})
# Test
rolling_metrics = calculate_rolling_metrics(portfolio_returns)
print(rolling_metrics.tail())
Section 13.2: Automated PDF Reports
PDF reports provide professional, print-ready documents for client communications.
In this section, you will learn: - ReportLab basics - Creating tables and charts for PDFs - Building multi-page reports
13.2.1 Creating Charts for Reports
def create_performance_chart(returns: pd.Series, benchmark_returns: pd.Series,
filename: str = 'performance.png') -> str:
"""
Create a performance chart for the report.
"""
cum_port = (1 + returns).cumprod() * 100
cum_bench = (1 + benchmark_returns).cumprod() * 100
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
ax.plot(cum_bench.index, cum_bench, label='Benchmark', linewidth=1.5,
linestyle='--', color='gray')
ax.set_xlabel('Date')
ax.set_ylabel('Growth of $100')
ax.set_title('Portfolio Performance vs Benchmark')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(filename, dpi=150, bbox_inches='tight')
plt.close()
return filename
def create_drawdown_chart(returns: pd.Series, filename: str = 'drawdown.png') -> str:
"""
Create a drawdown chart for the report.
"""
cum_returns = (1 + returns).cumprod()
running_max = cum_returns.cummax()
drawdown = (cum_returns - running_max) / running_max
fig, ax = plt.subplots(figsize=(8, 3))
ax.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
ax.plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
ax.set_xlabel('Date')
ax.set_ylabel('Drawdown (%)')
ax.set_title('Portfolio Drawdown')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(filename, dpi=150, bbox_inches='tight')
plt.close()
return filename
# Create charts
perf_chart = create_performance_chart(portfolio_returns, benchmark_returns)
dd_chart = create_drawdown_chart(portfolio_returns)
print(f"Charts created: {perf_chart}, {dd_chart}")
13.2.2 PDF Report Generation
def generate_pdf_report(metrics: dict, weights: dict,
filename: str = 'portfolio_report.pdf') -> str:
"""
Generate a professional PDF report.
"""
if not PDF_AVAILABLE:
print("PDF generation simulated (ReportLab not installed)")
return None
doc = SimpleDocTemplate(filename, pagesize=letter)
styles = getSampleStyleSheet()
story = []
# Title
title_style = ParagraphStyle(
'CustomTitle',
parent=styles['Heading1'],
fontSize=24,
spaceAfter=30,
alignment=1
)
story.append(Paragraph('Portfolio Performance Report', title_style))
# Report date
report_date = datetime.now().strftime('%B %d, %Y')
date_style = ParagraphStyle('DateStyle', parent=styles['Normal'],
fontSize=12, alignment=1)
story.append(Paragraph(f'Report Date: {report_date}', date_style))
story.append(Spacer(1, 20))
# Summary table
story.append(Paragraph('Executive Summary', styles['Heading2']))
story.append(Spacer(1, 10))
summary_data = [
['Metric', 'Value'],
['Total Return', f"{metrics['total_return']*100:.2f}%"],
['Benchmark Return', f"{metrics['benchmark_return']*100:.2f}%"],
['Sharpe Ratio', f"{metrics['sharpe_ratio']:.2f}"],
['Max Drawdown', f"{metrics['max_drawdown']*100:.2f}%"],
]
summary_table = Table(summary_data, colWidths=[2.5*inch, 2*inch])
summary_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#2E86AB')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('GRID', (0, 0), (-1, -1), 1, colors.black),
]))
story.append(summary_table)
# Build PDF
doc.build(story)
return filename
if PDF_AVAILABLE:
pdf_file = generate_pdf_report(metrics, weights)
print(f"PDF report generated: {pdf_file}")
else:
print("PDF generation skipped (install reportlab to enable)")
Exercise 13.2: Monthly Summary Table (Guided)
Your Task: Complete the function to create a monthly returns summary table for PDF reports.
Fill in the blanks to complete the function:
Click to reveal solution
def create_monthly_summary(returns: pd.Series) -> pd.DataFrame:
"""
Create monthly summary statistics for report table.
"""
# Resample returns to monthly frequency and calculate compound return
monthly_return = returns.resample('M').apply(lambda x: (1+x).prod() - 1)
# Calculate monthly volatility (annualized)
monthly_vol = returns.resample('M').std() * np.sqrt(252)
# Find best day each month
best_day = returns.resample('M').max()
# Find worst day each month
worst_day = returns.resample('M').min()
return pd.DataFrame({
'Return': monthly_return,
'Volatility': monthly_vol,
'Best Day': best_day,
'Worst Day': worst_day
})
# Test
summary = create_monthly_summary(portfolio_returns)
print(summary.tail())
Section 13.3: Excel Reports
Excel workbooks allow recipients to explore data interactively.
In this section, you will learn: - Creating formatted Excel workbooks - Multi-sheet reports - Conditional formatting
13.3.1 Excel Report Generation
def generate_excel_report(returns: pd.Series, benchmark_returns: pd.Series,
metrics: dict, weights: dict,
filename: str = 'portfolio_report.xlsx') -> str:
"""
Generate a comprehensive Excel report.
"""
if not EXCEL_AVAILABLE:
print("Excel generation simulated (openpyxl not installed)")
return None
wb = openpyxl.Workbook()
# Define styles
header_font = Font(bold=True, color='FFFFFF', size=12)
header_fill = PatternFill(start_color='2E86AB', end_color='2E86AB', fill_type='solid')
percent_format = '0.00%'
# Summary Sheet
ws_summary = wb.active
ws_summary.title = 'Summary'
ws_summary['A1'] = 'Portfolio Performance Report'
ws_summary['A1'].font = Font(bold=True, size=16)
ws_summary['A2'] = f'Report Date: {datetime.now().strftime("%Y-%m-%d")}'
# Metrics
metrics_data = [
('Metric', 'Value'),
('Total Return', metrics['total_return']),
('Benchmark Return', metrics['benchmark_return']),
('Sharpe Ratio', metrics['sharpe_ratio']),
('Max Drawdown', metrics['max_drawdown']),
]
for i, (metric, value) in enumerate(metrics_data):
row = 4 + i
ws_summary[f'A{row}'] = metric
ws_summary[f'B{row}'] = value
if i == 0:
ws_summary[f'A{row}'].font = header_font
ws_summary[f'A{row}'].fill = header_fill
ws_summary[f'B{row}'].font = header_font
ws_summary[f'B{row}'].fill = header_fill
elif metric != 'Sharpe Ratio':
ws_summary[f'B{row}'].number_format = percent_format
ws_summary.column_dimensions['A'].width = 20
ws_summary.column_dimensions['B'].width = 15
# Monthly Returns Sheet with conditional formatting
ws_monthly = wb.create_sheet('Monthly Returns')
monthly = returns.resample('M').apply(lambda x: (1+x).prod() - 1)
ws_monthly['A1'] = 'Date'
ws_monthly['B1'] = 'Return'
ws_monthly['A1'].font = header_font
ws_monthly['A1'].fill = header_fill
ws_monthly['B1'].font = header_font
ws_monthly['B1'].fill = header_fill
for i, (date, ret) in enumerate(monthly.items()):
row = 2 + i
ws_monthly[f'A{row}'] = date.strftime('%Y-%m')
ws_monthly[f'B{row}'] = ret
ws_monthly[f'B{row}'].number_format = percent_format
# Color code
if ret > 0:
ws_monthly[f'B{row}'].fill = PatternFill(
start_color='C6EFCE', end_color='C6EFCE', fill_type='solid')
else:
ws_monthly[f'B{row}'].fill = PatternFill(
start_color='FFC7CE', end_color='FFC7CE', fill_type='solid')
wb.save(filename)
return filename
if EXCEL_AVAILABLE:
excel_file = generate_excel_report(portfolio_returns, benchmark_returns, metrics, weights)
print(f"Excel report generated: {excel_file}")
else:
print("Excel generation skipped (install openpyxl to enable)")
Exercise 13.3: Holdings Sheet Creator (Guided)
Your Task: Complete the function to create a formatted holdings sheet for an Excel report.
Fill in the blanks to complete the function:
Click to reveal solution
def create_holdings_dataframe(weights: dict, prices: pd.DataFrame) -> pd.DataFrame:
"""
Create a holdings DataFrame with current values and metrics.
"""
holdings = []
for symbol, weight in weights.items():
# Get the last price for this symbol
last_price = prices[symbol].iloc[-1]
# Calculate 1-day return
daily_return = prices[symbol].pct_change().iloc[-1]
# Calculate YTD return
year_start = prices[prices.index.year == prices.index[-1].year][symbol].iloc[0]
ytd_return = (last_price / year_start) - 1
holdings.append({
'Symbol': symbol,
'Weight': weight,
'Price': last_price,
'1D Return': daily_return,
'YTD Return': ytd_return
})
return pd.DataFrame(holdings)
# Test
holdings_df = create_holdings_dataframe(weights, prices)
print(holdings_df)
Exercise 13.4: Custom Report Builder (Open-ended)
Your Task:
Build a function that creates a DataFrame containing a quarterly performance summary: - Quarterly returns (compound daily returns) - Quarterly volatility (annualized) - Quarterly Sharpe ratio - Best and worst months within each quarter
Your implementation:
Click to reveal solution
def create_quarterly_summary(returns: pd.Series) -> pd.DataFrame:
"""
Create quarterly performance summary.
"""
# Quarterly returns
quarterly_return = returns.resample('Q').apply(lambda x: (1+x).prod() - 1)
# Quarterly volatility (annualized)
quarterly_vol = returns.resample('Q').std() * np.sqrt(252)
# Quarterly Sharpe
quarterly_sharpe = (returns.resample('Q').mean() * 252) / quarterly_vol
# Monthly returns for best/worst
monthly = returns.resample('M').apply(lambda x: (1+x).prod() - 1)
# Best/worst months per quarter
best_months = []
worst_months = []
for q_end in quarterly_return.index:
q_start = q_end - pd.offsets.QuarterEnd(1) + pd.offsets.Day(1)
q_months = monthly[(monthly.index >= q_start) & (monthly.index <= q_end)]
if len(q_months) > 0:
best_months.append(q_months.max())
worst_months.append(q_months.min())
else:
best_months.append(np.nan)
worst_months.append(np.nan)
return pd.DataFrame({
'Quarterly Return': quarterly_return,
'Volatility': quarterly_vol,
'Sharpe': quarterly_sharpe,
'Best Month': best_months,
'Worst Month': worst_months
})
# Test
quarterly = create_quarterly_summary(portfolio_returns)
print(quarterly)
Section 13.4: Performance Tear Sheets
Tear sheets provide a one-page summary of strategy performance for quick evaluation.
In this section, you will learn: - Tear sheet design principles - Multi-panel layouts - Key visualizations
13.4.1 Creating Professional Tear Sheets
def create_tear_sheet(returns: pd.Series, benchmark_returns: pd.Series,
weights: dict, strategy_name: str = 'Portfolio'):
"""
Create a professional performance tear sheet.
"""
# Calculate metrics
metrics = calculate_report_metrics(returns, benchmark_returns)
# Create figure
fig = plt.figure(figsize=(12, 14))
fig.suptitle(f'{strategy_name} Performance Tear Sheet', fontsize=16, fontweight='bold', y=0.98)
gs = fig.add_gridspec(4, 2, height_ratios=[0.5, 1, 1, 1], hspace=0.3, wspace=0.3)
# Row 1: Key Metrics
ax_metrics = fig.add_subplot(gs[0, :])
ax_metrics.axis('off')
metrics_text = (
f"Total Return: {metrics['total_return']*100:.2f}% | "
f"Ann. Return: {metrics['ann_return']*100:.2f}% | "
f"Volatility: {metrics['ann_volatility']*100:.2f}% | "
f"Sharpe: {metrics['sharpe_ratio']:.2f} | "
f"Max DD: {metrics['max_drawdown']*100:.2f}%"
)
ax_metrics.text(0.5, 0.5, metrics_text, transform=ax_metrics.transAxes,
ha='center', va='center', fontsize=11,
bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))
# Row 2: Cumulative Returns
ax_cum = fig.add_subplot(gs[1, :])
cum_port = (1 + returns).cumprod()
cum_bench = (1 + benchmark_returns).cumprod()
ax_cum.plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
ax_cum.plot(cum_bench.index, cum_bench, label='Benchmark', linewidth=1.5,
linestyle='--', color='gray')
ax_cum.set_ylabel('Cumulative Return')
ax_cum.set_title('Cumulative Returns')
ax_cum.legend(loc='upper left')
ax_cum.grid(True, alpha=0.3)
# Row 3: Drawdown
ax_dd = fig.add_subplot(gs[2, :])
running_max = cum_port.cummax()
drawdown = (cum_port - running_max) / running_max
ax_dd.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
ax_dd.plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
ax_dd.set_ylabel('Drawdown (%)')
ax_dd.set_title('Drawdown')
ax_dd.grid(True, alpha=0.3)
# Row 4: Distribution and Allocation
ax_dist = fig.add_subplot(gs[3, 0])
ax_dist.hist(returns * 100, bins=50, alpha=0.7, color='steelblue', edgecolor='white')
ax_dist.axvline(returns.mean() * 100, color='red', linestyle='--',
label=f"Mean: {returns.mean()*100:.3f}%")
ax_dist.set_xlabel('Daily Return (%)')
ax_dist.set_ylabel('Frequency')
ax_dist.set_title('Return Distribution')
ax_dist.legend()
ax_dist.grid(True, alpha=0.3)
ax_pie = fig.add_subplot(gs[3, 1])
ax_pie.pie([weights[k] for k in weights], labels=weights.keys(), autopct='%1.1f%%',
colors=plt.cm.Set3(np.linspace(0, 1, len(weights))))
ax_pie.set_title('Current Allocation')
plt.tight_layout()
filename = f'{strategy_name.lower().replace(" ", "_")}_tearsheet.png'
plt.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white')
plt.show()
return filename
# Create tear sheet
tearsheet = create_tear_sheet(portfolio_returns, benchmark_returns, weights, 'Balanced Portfolio')
print(f"\nTear sheet saved as: {tearsheet}")
Exercise 13.5: Risk Metrics Panel (Open-ended)
Your Task:
Build a function that creates a risk metrics visualization panel: - Rolling volatility plot (21-day window) - VaR histogram with 95% and 99% VaR lines marked - Rolling beta to benchmark (63-day window) - Return the figure object
Your implementation:
Click to reveal solution
def create_risk_panel(returns: pd.Series, benchmark_returns: pd.Series) -> plt.Figure:
"""
Create a risk metrics visualization panel.
"""
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
# 1. Rolling volatility
ax1 = axes[0, 0]
rolling_vol = returns.rolling(21).std() * np.sqrt(252) * 100
ax1.plot(rolling_vol.index, rolling_vol, color='orange', linewidth=1)
ax1.axhline(returns.std() * np.sqrt(252) * 100, color='black', linestyle='--', alpha=0.7)
ax1.set_title('21-Day Rolling Volatility (%)')
ax1.set_ylabel('Volatility (%)')
ax1.grid(True, alpha=0.3)
# 2. VaR histogram
ax2 = axes[0, 1]
var_95 = np.percentile(returns, 5)
var_99 = np.percentile(returns, 1)
ax2.hist(returns * 100, bins=50, alpha=0.7, color='steelblue', edgecolor='white')
ax2.axvline(var_95 * 100, color='orange', linestyle='--', linewidth=2, label=f'95% VaR: {var_95*100:.2f}%')
ax2.axvline(var_99 * 100, color='red', linestyle='--', linewidth=2, label=f'99% VaR: {var_99*100:.2f}%')
ax2.set_title('Return Distribution with VaR')
ax2.set_xlabel('Daily Return (%)')
ax2.legend()
ax2.grid(True, alpha=0.3)
# 3. Rolling beta
ax3 = axes[1, 0]
rolling_cov = returns.rolling(63).cov(benchmark_returns)
rolling_var = benchmark_returns.rolling(63).var()
rolling_beta = rolling_cov / rolling_var
ax3.plot(rolling_beta.index, rolling_beta, color='purple', linewidth=1)
ax3.axhline(1.0, color='black', linestyle='--', alpha=0.5)
ax3.set_title('63-Day Rolling Beta')
ax3.set_ylabel('Beta')
ax3.grid(True, alpha=0.3)
# 4. Drawdown underwater chart
ax4 = axes[1, 1]
cum_returns = (1 + returns).cumprod()
drawdown = (cum_returns - cum_returns.cummax()) / cum_returns.cummax()
ax4.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
ax4.set_title('Drawdown')
ax4.set_ylabel('Drawdown (%)')
ax4.grid(True, alpha=0.3)
plt.tight_layout()
return fig
# Test
fig = create_risk_panel(portfolio_returns, benchmark_returns)
plt.show()
Exercise 13.6: Complete Reporting Suite (Open-ended)
Your Task:
Build a ReportGenerator class that:
- Takes portfolio returns, benchmark returns, and weights in the constructor
- Has a calculate_all_metrics() method returning a dictionary of metrics
- Has a print_summary() method that prints a formatted console report
- Has a create_charts() method that creates and saves performance/drawdown charts
- Has a generate_report() method that calls all above methods in sequence
Your implementation:
Click to reveal solution
class ReportGenerator:
"""
Complete financial reporting suite.
"""
def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series,
weights: dict, strategy_name: str = 'Portfolio'):
self.returns = portfolio_returns
self.benchmark = benchmark_returns
self.weights = weights
self.name = strategy_name
self.metrics = None
def calculate_all_metrics(self) -> dict:
"""Calculate comprehensive metrics."""
cum_port = (1 + self.returns).cumprod()
cum_bench = (1 + self.benchmark).cumprod()
total_return = cum_port.iloc[-1] - 1
bench_return = cum_bench.iloc[-1] - 1
n_years = len(self.returns) / 252
ann_return = (1 + total_return) ** (1/n_years) - 1
ann_vol = self.returns.std() * np.sqrt(252)
sharpe = ann_return / ann_vol
running_max = cum_port.cummax()
max_dd = ((cum_port - running_max) / running_max).min()
self.metrics = {
'total_return': total_return,
'benchmark_return': bench_return,
'active_return': total_return - bench_return,
'ann_return': ann_return,
'ann_volatility': ann_vol,
'sharpe_ratio': sharpe,
'max_drawdown': max_dd,
'start_date': self.returns.index[0],
'end_date': self.returns.index[-1]
}
return self.metrics
def print_summary(self):
"""Print formatted summary report."""
if self.metrics is None:
self.calculate_all_metrics()
print(f"\n{'='*50}")
print(f"{self.name} Performance Report")
print(f"{'='*50}")
print(f"Period: {self.metrics['start_date'].strftime('%Y-%m-%d')} to {self.metrics['end_date'].strftime('%Y-%m-%d')}")
print(f"\nRETURNS")
print(f" Total Return: {self.metrics['total_return']*100:>8.2f}%")
print(f" Benchmark Return: {self.metrics['benchmark_return']*100:>8.2f}%")
print(f" Annualized Return: {self.metrics['ann_return']*100:>8.2f}%")
print(f"\nRISK")
print(f" Volatility: {self.metrics['ann_volatility']*100:>8.2f}%")
print(f" Max Drawdown: {self.metrics['max_drawdown']*100:>8.2f}%")
print(f" Sharpe Ratio: {self.metrics['sharpe_ratio']:>8.2f}")
print(f"{'='*50}\n")
def create_charts(self, output_dir: str = '.'):
"""Create and save performance charts."""
# Performance chart
fig, axes = plt.subplots(2, 1, figsize=(10, 8))
cum_port = (1 + self.returns).cumprod()
cum_bench = (1 + self.benchmark).cumprod()
axes[0].plot(cum_port.index, cum_port, label='Portfolio', linewidth=2)
axes[0].plot(cum_bench.index, cum_bench, label='Benchmark', linestyle='--')
axes[0].set_title('Cumulative Returns')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
drawdown = (cum_port - cum_port.cummax()) / cum_port.cummax()
axes[1].fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
axes[1].set_title('Drawdown')
axes[1].set_ylabel('Drawdown (%)')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_charts.png'
plt.savefig(filename, dpi=150, bbox_inches='tight')
plt.close()
return filename
def generate_report(self, output_dir: str = '.'):
"""Generate complete report package."""
self.calculate_all_metrics()
self.print_summary()
chart_file = self.create_charts(output_dir)
print(f"Charts saved: {chart_file}")
return self.metrics
# Test
reporter = ReportGenerator(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
reporter.generate_report()
Module Project: Complete Reporting Suite
Build a comprehensive reporting system that combines all concepts from this module.
Your Challenge:
Build a ReportingSuite class that includes:
1. Comprehensive metrics calculation
2. Console summary printing
3. Chart generation
4. Tear sheet creation
5. Optional PDF and Excel generation
# YOUR CODE HERE - Module Project
Click to reveal solution
class ReportingSuite:
"""
Professional financial reporting system.
Features:
- Comprehensive metrics calculation
- Console summary reports
- Chart generation
- Performance tear sheets
- Optional PDF and Excel reports
"""
def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series,
weights: dict, strategy_name: str = 'Portfolio'):
self.returns = portfolio_returns
self.benchmark = benchmark_returns
self.weights = weights
self.name = strategy_name
self.metrics = self._calculate_metrics()
def _calculate_metrics(self) -> dict:
"""Calculate all performance metrics."""
cum_port = (1 + self.returns).cumprod()
cum_bench = (1 + self.benchmark).cumprod()
total_return = cum_port.iloc[-1] - 1
bench_return = cum_bench.iloc[-1] - 1
n_years = len(self.returns) / 252
ann_return = (1 + total_return) ** (1/n_years) - 1
ann_vol = self.returns.std() * np.sqrt(252)
sharpe = ann_return / ann_vol
# Downside metrics
downside = self.returns[self.returns < 0]
sortino = ann_return / (downside.std() * np.sqrt(252))
# Drawdown
running_max = cum_port.cummax()
drawdown = (cum_port - running_max) / running_max
max_dd = drawdown.min()
# Relative metrics
active = self.returns - self.benchmark
tracking_error = active.std() * np.sqrt(252)
info_ratio = (active.mean() * 252) / tracking_error
# Beta/Alpha
cov = np.cov(self.returns, self.benchmark)[0, 1]
beta = cov / self.benchmark.var()
alpha = ann_return - beta * (self.benchmark.mean() * 252)
return {
'total_return': total_return,
'benchmark_return': bench_return,
'active_return': total_return - bench_return,
'ann_return': ann_return,
'ann_volatility': ann_vol,
'sharpe_ratio': sharpe,
'sortino_ratio': sortino,
'max_drawdown': max_dd,
'tracking_error': tracking_error,
'information_ratio': info_ratio,
'beta': beta,
'alpha': alpha,
'var_95': -np.percentile(self.returns, 5),
'win_rate': (self.returns > 0).mean(),
'start_date': self.returns.index[0],
'end_date': self.returns.index[-1]
}
def print_summary(self):
"""Print formatted summary to console."""
m = self.metrics
print(f"\n{'='*60}")
print(f"{self.name} Performance Report")
print(f"{'='*60}")
print(f"Period: {m['start_date'].strftime('%Y-%m-%d')} to {m['end_date'].strftime('%Y-%m-%d')}")
print(f"\nRETURN METRICS")
print(f"-" * 40)
print(f" Total Return: {m['total_return']*100:>10.2f}%")
print(f" Benchmark Return: {m['benchmark_return']*100:>10.2f}%")
print(f" Active Return: {m['active_return']*100:>10.2f}%")
print(f" Annualized Return: {m['ann_return']*100:>10.2f}%")
print(f"\nRISK METRICS")
print(f"-" * 40)
print(f" Volatility: {m['ann_volatility']*100:>10.2f}%")
print(f" Max Drawdown: {m['max_drawdown']*100:>10.2f}%")
print(f" 95% VaR: {m['var_95']*100:>10.2f}%")
print(f" Tracking Error: {m['tracking_error']*100:>10.2f}%")
print(f"\nRISK-ADJUSTED")
print(f"-" * 40)
print(f" Sharpe Ratio: {m['sharpe_ratio']:>10.2f}")
print(f" Sortino Ratio: {m['sortino_ratio']:>10.2f}")
print(f" Info Ratio: {m['information_ratio']:>10.2f}")
print(f"{'='*60}\n")
def create_charts(self, output_dir: str = '.') -> str:
"""Generate performance and drawdown charts."""
fig, axes = plt.subplots(2, 1, figsize=(10, 8))
cum_port = (1 + self.returns).cumprod()
cum_bench = (1 + self.benchmark).cumprod()
axes[0].plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
axes[0].plot(cum_bench.index, cum_bench, label='Benchmark', linestyle='--', color='gray')
axes[0].set_title(f'{self.name} - Cumulative Returns')
axes[0].legend(loc='upper left')
axes[0].grid(True, alpha=0.3)
drawdown = (cum_port - cum_port.cummax()) / cum_port.cummax()
axes[1].fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
axes[1].plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
axes[1].set_title('Drawdown')
axes[1].set_ylabel('Drawdown (%)')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_charts.png'
plt.savefig(filename, dpi=150, bbox_inches='tight')
plt.show()
return filename
def create_tear_sheet(self, output_dir: str = '.') -> str:
"""Generate one-page tear sheet."""
fig = plt.figure(figsize=(12, 14))
fig.suptitle(f'{self.name} Tear Sheet', fontsize=16, fontweight='bold', y=0.98)
gs = fig.add_gridspec(4, 2, height_ratios=[0.4, 1, 1, 1], hspace=0.3, wspace=0.3)
# Metrics summary
ax_sum = fig.add_subplot(gs[0, :])
ax_sum.axis('off')
m = self.metrics
text = (f"Return: {m['total_return']*100:.1f}% | "
f"Vol: {m['ann_volatility']*100:.1f}% | "
f"Sharpe: {m['sharpe_ratio']:.2f} | "
f"Max DD: {m['max_drawdown']*100:.1f}%")
ax_sum.text(0.5, 0.5, text, ha='center', va='center', fontsize=12,
bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))
# Cumulative returns
ax_cum = fig.add_subplot(gs[1, :])
cum_port = (1 + self.returns).cumprod()
cum_bench = (1 + self.benchmark).cumprod()
ax_cum.plot(cum_port, label='Portfolio', linewidth=2)
ax_cum.plot(cum_bench, label='Benchmark', linestyle='--', alpha=0.7)
ax_cum.set_title('Cumulative Returns')
ax_cum.legend()
ax_cum.grid(True, alpha=0.3)
# Drawdown
ax_dd = fig.add_subplot(gs[2, :])
dd = (cum_port - cum_port.cummax()) / cum_port.cummax()
ax_dd.fill_between(dd.index, dd * 100, 0, alpha=0.5, color='red')
ax_dd.set_title('Drawdown')
ax_dd.set_ylabel('%')
ax_dd.grid(True, alpha=0.3)
# Distribution
ax_hist = fig.add_subplot(gs[3, 0])
ax_hist.hist(self.returns * 100, bins=50, alpha=0.7, color='steelblue')
ax_hist.axvline(self.returns.mean() * 100, color='red', linestyle='--')
ax_hist.set_title('Return Distribution')
ax_hist.set_xlabel('Daily Return (%)')
ax_hist.grid(True, alpha=0.3)
# Allocation
ax_pie = fig.add_subplot(gs[3, 1])
ax_pie.pie(list(self.weights.values()), labels=list(self.weights.keys()),
autopct='%1.1f%%', colors=plt.cm.Set3(np.linspace(0, 1, len(self.weights))))
ax_pie.set_title('Allocation')
plt.tight_layout()
filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_tearsheet.png'
plt.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white')
plt.show()
return filename
def generate_all_reports(self, output_dir: str = '.'):
"""Generate complete report package."""
print(f"Generating reports for {self.name}...\n")
self.print_summary()
chart_file = self.create_charts(output_dir)
tearsheet_file = self.create_tear_sheet(output_dir)
print(f"\nReports generated:")
print(f" - Charts: {chart_file}")
print(f" - Tear Sheet: {tearsheet_file}")
print("\nReport generation complete!")
# Demo
suite = ReportingSuite(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
suite.generate_all_reports()
# Solution - Module Project
class ReportingSuite:
"""
Professional financial reporting system.
"""
def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series,
weights: dict, strategy_name: str = 'Portfolio'):
self.returns = portfolio_returns
self.benchmark = benchmark_returns
self.weights = weights
self.name = strategy_name
self.metrics = self._calculate_metrics()
def _calculate_metrics(self) -> dict:
"""Calculate all performance metrics."""
return calculate_report_metrics(self.returns, self.benchmark)
def print_summary(self):
"""Print formatted summary to console."""
m = self.metrics
print(f"\n{'='*60}")
print(f"{self.name} Performance Report")
print(f"{'='*60}")
print(f"Period: {m['start_date'].strftime('%Y-%m-%d')} to {m['end_date'].strftime('%Y-%m-%d')}")
print(f"\nRETURN METRICS")
print(f" Total Return: {m['total_return']*100:>10.2f}%")
print(f" Benchmark Return: {m['benchmark_return']*100:>10.2f}%")
print(f" Annualized Return: {m['ann_return']*100:>10.2f}%")
print(f"\nRISK METRICS")
print(f" Volatility: {m['ann_volatility']*100:>10.2f}%")
print(f" Max Drawdown: {m['max_drawdown']*100:>10.2f}%")
print(f"\nRISK-ADJUSTED")
print(f" Sharpe Ratio: {m['sharpe_ratio']:>10.2f}")
print(f" Sortino Ratio: {m['sortino_ratio']:>10.2f}")
print(f"{'='*60}\n")
def generate_all_reports(self, output_dir: str = '.'):
"""Generate complete report package."""
print(f"Generating reports for {self.name}...")
self.print_summary()
create_tear_sheet(self.returns, self.benchmark, self.weights, self.name)
print("\nReport generation complete!")
# Demo
suite = ReportingSuite(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
suite.print_summary()
Key Takeaways
What You Learned
1. Report Design Principles
- Structure reports with executive summary first
- Tailor content to audience (client vs internal vs regulatory)
- Include benchmarks for context
2. PDF Generation
- ReportLab provides professional PDF output
- Use tables for metrics, images for charts
- Include proper disclaimers
3. Excel Reports
- openpyxl enables formatted workbooks
- Color-code positive/negative values
- Separate sheets for different data views
4. Tear Sheets
- One-page summary of strategy performance
- Include all key metrics and visualizations
- Useful for marketing and quick reviews
Coming Up Next
In Module 14: Rebalancing & Execution, we'll explore: - Rebalancing strategies (calendar, threshold) - Transaction cost analysis - Tax-loss harvesting - Implementation shortfall
Congratulations on completing Module 13!
Module 14: Rebalancing & Execution
Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure
Learning Objectives
By the end of this module, you will be able to:
- Implement calendar and threshold rebalancing strategies
- Model and minimize transaction costs
- Apply tax-loss harvesting techniques
- Measure implementation shortfall and execution quality
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 5, 11 |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
print('Module 14: Rebalancing & Execution - Ready!')
Load Data
# Download data
tickers = ['SPY', 'AGG', 'GLD', 'VNQ']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)
if isinstance(data.columns, pd.MultiIndex):
prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
prices = data
returns = prices.pct_change().dropna()
# Target allocation
target_weights = {'SPY': 0.50, 'AGG': 0.30, 'GLD': 0.10, 'VNQ': 0.10}
print(f'Data loaded: {len(returns)} trading days')
Section 14.1: Rebalancing Strategies
Over time, assets with higher returns grow to dominate your portfolio, increasing concentration risk and drifting away from your intended allocation.
In this section, you will learn: - Calendar rebalancing (fixed schedule) - Threshold rebalancing (drift-triggered) - Hybrid approaches
14.1.1 Portfolio Drift Simulation
def simulate_portfolio_drift(initial_weights: dict, returns_df: pd.DataFrame,
rebalance_frequency: str = None) -> pd.DataFrame:
"""
Simulate how portfolio weights drift over time.
Parameters:
-----------
initial_weights : dict
Target weights for each asset
returns_df : DataFrame
Daily returns for each asset
rebalance_frequency : str
'M' for monthly, 'Q' for quarterly, None for never
"""
assets = list(initial_weights.keys())
weights = pd.DataFrame(index=returns_df.index, columns=assets, dtype=float)
current_weights = np.array([initial_weights[a] for a in assets])
if rebalance_frequency:
rebalance_dates = returns_df.resample(rebalance_frequency).last().index
else:
rebalance_dates = []
for date in returns_df.index:
weights.loc[date] = current_weights
day_returns = returns_df.loc[date, assets].values
growth = 1 + day_returns
new_values = current_weights * growth
current_weights = new_values / new_values.sum()
if date in rebalance_dates:
current_weights = np.array([initial_weights[a] for a in assets])
return weights
# Simulate drift scenarios
weights_never = simulate_portfolio_drift(target_weights, returns, None)
weights_monthly = simulate_portfolio_drift(target_weights, returns, 'M')
weights_quarterly = simulate_portfolio_drift(target_weights, returns, 'Q')
print("Final SPY weight by rebalancing approach:")
print(f" Never rebalanced: {weights_never['SPY'].iloc[-1]:.1%}")
print(f" Monthly: {weights_monthly['SPY'].iloc[-1]:.1%}")
print(f" Quarterly: {weights_quarterly['SPY'].iloc[-1]:.1%}")
14.1.2 Threshold Rebalancing
class ThresholdRebalancer:
"""
Rebalances when any asset drifts beyond a threshold.
"""
def __init__(self, target_weights: dict, threshold: float = 0.05):
self.target_weights = target_weights
self.threshold = threshold
self.assets = list(target_weights.keys())
def needs_rebalance(self, current_weights: dict) -> bool:
"""Check if any asset has drifted beyond threshold."""
for asset, target in self.target_weights.items():
current = current_weights.get(asset, 0)
if abs(current - target) > self.threshold:
return True
return False
def backtest(self, returns_df: pd.DataFrame, initial_value: float = 100000,
transaction_cost: float = 0.001) -> tuple:
"""Backtest the threshold rebalancing strategy."""
portfolio_value = initial_value
holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}
values = []
total_costs = 0
num_rebalances = 0
for date in returns_df.index:
for asset in self.assets:
holdings[asset] *= (1 + returns_df.loc[date, asset])
portfolio_value = sum(holdings.values())
current_weights = {a: holdings[a]/portfolio_value for a in self.assets}
if self.needs_rebalance(current_weights):
turnover = sum(abs(self.target_weights[a] - current_weights[a])
for a in self.assets) * portfolio_value / 2
cost = turnover * transaction_cost
total_costs += cost
num_rebalances += 1
portfolio_value -= cost
holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}
values.append({'date': date, 'value': portfolio_value})
return pd.DataFrame(values).set_index('date'), total_costs, num_rebalances
# Compare thresholds
print("Threshold Rebalancing Comparison")
print("=" * 50)
for thresh in [0.03, 0.05, 0.10]:
rebalancer = ThresholdRebalancer(target_weights, threshold=thresh)
values, costs, num_rebal = rebalancer.backtest(returns)
print(f"{thresh:.0%} Threshold: Final=${values['value'].iloc[-1]:,.0f}, "
f"Rebalances={num_rebal}, Costs=${costs:,.0f}")
Exercise 14.1: Drift Calculator (Guided)
Your Task: Complete the function to calculate portfolio drift metrics.
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_drift_metrics(current_weights: dict, target_weights: dict) -> dict:
"""
Calculate portfolio drift from target allocation.
"""
drifts = {}
for asset in target_weights:
current = current_weights.get(asset, 0)
target = target_weights[asset]
# Calculate absolute drift
drifts[asset] = current - target
# Find the maximum absolute drift
max_drift = max(abs(d) for d in drifts.values())
# Calculate total absolute drift
total_drift = sum(abs(d) for d in drifts.values())
return {
'drifts': drifts,
'max_drift': max_drift,
'total_drift': total_drift
}
# Test
current = {'SPY': 0.55, 'AGG': 0.25, 'GLD': 0.12, 'VNQ': 0.08}
metrics = calculate_drift_metrics(current, target_weights)
print(f"Max drift: {metrics['max_drift']:.1%}")
print(f"Total drift: {metrics['total_drift']:.1%}")
Section 14.2: Transaction Cost Analysis
Transaction costs can significantly impact portfolio performance.
In this section, you will learn: - Types of transaction costs (explicit and implicit) - Market impact modeling - Cost-aware portfolio construction
14.2.1 Transaction Cost Model
class TransactionCostModel:
"""
Comprehensive transaction cost estimator.
"""
def __init__(self, commission_per_share: float = 0.005,
commission_min: float = 1.0):
self.commission_per_share = commission_per_share
self.commission_min = commission_min
def estimate_spread_cost(self, price: float, spread_bps: float = 10) -> float:
"""Estimate bid-ask spread cost (half spread per transaction)."""
spread = price * (spread_bps / 10000)
return spread / 2
def estimate_market_impact(self, trade_size: int, avg_daily_volume: int,
price: float, volatility: float = 0.02) -> float:
"""
Estimate market impact using square-root model.
Impact = sigma * sqrt(Q/V)
"""
participation_rate = trade_size / avg_daily_volume
impact_pct = volatility * np.sqrt(participation_rate)
return price * impact_pct
def total_cost(self, shares: int, price: float,
avg_daily_volume: int = 1000000,
volatility: float = 0.02) -> dict:
"""Calculate total transaction cost."""
trade_value = shares * price
commission = max(shares * self.commission_per_share, self.commission_min)
spread_cost = self.estimate_spread_cost(price) * shares
impact_cost = self.estimate_market_impact(
shares, avg_daily_volume, price, volatility
) * shares
return {
'commission': commission,
'spread': spread_cost,
'market_impact': impact_cost,
'total': commission + spread_cost + impact_cost,
'total_bps': (commission + spread_cost + impact_cost) / trade_value * 10000
}
# Example
cost_model = TransactionCostModel()
costs = cost_model.total_cost(shares=1000, price=150, avg_daily_volume=5_000_000)
print("Transaction Cost Breakdown (1000 shares @ $150)")
print("=" * 45)
for component, value in costs.items():
if component == 'total_bps':
print(f"Total: {value:.2f} bps")
else:
print(f"{component.capitalize()}: ${value:.2f}")
Exercise 14.2: Break-Even Holding Period (Guided)
Your Task: Complete the function to calculate how long you need to hold a position for expected returns to overcome transaction costs.
Fill in the blanks to complete the function:
Click to reveal solution
def break_even_holding_period(expected_annual_return: float,
round_trip_cost: float) -> float:
"""
Calculate minimum holding period for returns to exceed costs.
"""
# Calculate daily expected return (252 trading days/year)
daily_return = expected_annual_return / 252
# Break-even when daily_return * days = round_trip_cost
break_even_days = round_trip_cost / daily_return
return break_even_days
# Test
scenarios = [
(0.10, 0.002, "Stock 10% return, 20 bps cost"),
(0.20, 0.002, "Growth 20% return, 20 bps cost"),
(0.05, 0.001, "Bond 5% return, 10 bps cost")
]
for ret, cost, desc in scenarios:
days = break_even_holding_period(ret, cost)
print(f"{desc}: {days:.1f} days ({days/21:.1f} months)")
Section 14.3: Tax-Loss Harvesting
Tax-loss harvesting strategically realizes losses to offset gains and reduce tax liability.
In this section, you will learn: - Identifying harvest opportunities - Wash sale rule compliance - Calculating tax benefits
14.3.1 Tax-Loss Harvesting System
class TaxLossHarvester:
"""
Implements tax-loss harvesting strategy.
"""
def __init__(self, tax_rate_short: float = 0.37,
tax_rate_long: float = 0.20):
self.tax_rate_short = tax_rate_short
self.tax_rate_long = tax_rate_long
def identify_opportunities(self, positions: list,
min_loss_pct: float = 0.05) -> list:
"""
Find positions with harvestable losses.
Parameters:
-----------
positions : list of dict
Each has: symbol, cost_basis, current_value, purchase_date
min_loss_pct : float
Minimum loss percentage to trigger harvest
"""
opportunities = []
today = datetime.now()
for pos in positions:
gain_loss = pos['current_value'] - pos['cost_basis']
gain_loss_pct = gain_loss / pos['cost_basis']
if gain_loss_pct <= -min_loss_pct:
holding_period = (today - pos['purchase_date']).days
is_long_term = holding_period >= 365
tax_rate = self.tax_rate_long if is_long_term else self.tax_rate_short
tax_benefit = abs(gain_loss) * tax_rate
opportunities.append({
'symbol': pos['symbol'],
'loss': gain_loss,
'loss_pct': gain_loss_pct,
'holding_days': holding_period,
'is_long_term': is_long_term,
'tax_benefit': tax_benefit
})
return sorted(opportunities, key=lambda x: x['tax_benefit'], reverse=True)
# Example positions
positions = [
{'symbol': 'AAPL', 'cost_basis': 50000, 'current_value': 65000,
'purchase_date': datetime(2023, 1, 15)},
{'symbol': 'MSFT', 'cost_basis': 40000, 'current_value': 38000,
'purchase_date': datetime(2024, 6, 1)},
{'symbol': 'NVDA', 'cost_basis': 20000, 'current_value': 15000,
'purchase_date': datetime(2024, 9, 1)},
]
harvester = TaxLossHarvester()
opportunities = harvester.identify_opportunities(positions)
print("Tax-Loss Harvesting Opportunities")
print("=" * 50)
for opp in opportunities:
print(f"{opp['symbol']}: Loss ${opp['loss']:,.0f} ({opp['loss_pct']:.1%}), "
f"Tax Benefit ${opp['tax_benefit']:,.0f}")
Exercise 14.3: Tax Benefit Calculator (Guided)
Your Task: Complete the function to calculate net tax savings from harvesting losses.
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_tax_savings(realized_gains: float, harvested_losses: float,
tax_rate: float = 0.30) -> dict:
"""
Calculate tax savings from loss harvesting.
"""
# Calculate tax without harvesting
tax_without = realized_gains * tax_rate
# Calculate net taxable gains after offsetting losses
net_gains = max(0, realized_gains - harvested_losses)
# Calculate tax with harvesting
tax_with = net_gains * tax_rate
# Calculate savings
savings = tax_without - tax_with
return {
'tax_without_harvesting': tax_without,
'tax_with_harvesting': tax_with,
'tax_savings': savings
}
# Test
result = calculate_tax_savings(realized_gains=15000, harvested_losses=5000)
print(f"Tax without harvesting: ${result['tax_without_harvesting']:,.0f}")
print(f"Tax with harvesting: ${result['tax_with_harvesting']:,.0f}")
print(f"Tax savings: ${result['tax_savings']:,.0f}")
Exercise 14.4: Hybrid Rebalancer (Open-ended)
Your Task:
Build a HybridRebalancer class that:
- Checks on a calendar schedule (e.g., monthly)
- Only rebalances if max drift exceeds a trigger threshold
- When rebalancing, only trades assets drifted beyond a trade threshold
- Returns the final portfolio value, total costs, and number of rebalances
Your implementation:
Click to reveal solution
class HybridRebalancer:
"""
Combines calendar checks with threshold triggers and partial rebalancing.
"""
def __init__(self, target_weights: dict, check_frequency: str = 'M',
trigger_threshold: float = 0.05, trade_threshold: float = 0.02):
self.target_weights = target_weights
self.check_frequency = check_frequency
self.trigger_threshold = trigger_threshold
self.trade_threshold = trade_threshold
self.assets = list(target_weights.keys())
def backtest(self, returns_df: pd.DataFrame, initial_value: float = 100000,
transaction_cost: float = 0.001) -> tuple:
portfolio_value = initial_value
holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}
values = []
total_costs = 0
num_rebalances = 0
check_dates = set(returns_df.resample(self.check_frequency).last().index)
for date in returns_df.index:
for asset in self.assets:
holdings[asset] *= (1 + returns_df.loc[date, asset])
portfolio_value = sum(holdings.values())
current_weights = {a: holdings[a]/portfolio_value for a in self.assets}
if date in check_dates:
max_drift = max(abs(current_weights[a] - self.target_weights[a])
for a in self.assets)
if max_drift >= self.trigger_threshold:
new_weights = current_weights.copy()
for asset in self.assets:
drift = abs(current_weights[asset] - self.target_weights[asset])
if drift >= self.trade_threshold:
new_weights[asset] = self.target_weights[asset]
total = sum(new_weights.values())
new_weights = {a: w/total for a, w in new_weights.items()}
turnover = sum(abs(new_weights[a] - current_weights[a])
for a in self.assets) / 2
cost = turnover * portfolio_value * transaction_cost
total_costs += cost
num_rebalances += 1
portfolio_value -= cost
holdings = {a: new_weights[a] * portfolio_value for a in self.assets}
values.append({'date': date, 'value': portfolio_value})
return pd.DataFrame(values).set_index('date'), total_costs, num_rebalances
# Test
hybrid = HybridRebalancer(target_weights, check_frequency='M',
trigger_threshold=0.05, trade_threshold=0.02)
values, costs, num_rebal = hybrid.backtest(returns)
print(f"Final Value: ${values['value'].iloc[-1]:,.0f}")
print(f"Total Costs: ${costs:,.0f}")
print(f"Rebalances: {num_rebal}")
Section 14.4: Implementation Shortfall
Implementation shortfall measures the total cost of executing a trading decision.
In this section, you will learn: - Components of implementation shortfall - VWAP and TWAP execution - Measuring execution quality
14.4.1 Implementation Shortfall Analysis
def calculate_implementation_shortfall(order: dict) -> dict:
"""
Calculate implementation shortfall components.
Parameters:
-----------
order : dict
decision_price, arrival_price, execution_price,
close_price, shares_ordered, shares_filled, side
"""
side_mult = 1 if order['side'] == 'buy' else -1
# Delay cost: decision to arrival
delay_cost = side_mult * (
order['arrival_price'] - order['decision_price']
) / order['decision_price']
# Trading cost: arrival to execution
trading_cost = side_mult * (
order['execution_price'] - order['arrival_price']
) / order['decision_price']
# Opportunity cost: unfilled portion
unfilled = order['shares_ordered'] - order['shares_filled']
if unfilled > 0:
opp_cost = side_mult * unfilled / order['shares_ordered'] * (
order['close_price'] - order['decision_price']
) / order['decision_price']
else:
opp_cost = 0
return {
'delay_cost_bps': delay_cost * 10000,
'trading_cost_bps': trading_cost * 10000,
'opportunity_cost_bps': opp_cost * 10000,
'total_shortfall_bps': (delay_cost + trading_cost + opp_cost) * 10000
}
# Example
order = {
'side': 'buy',
'decision_price': 185.00,
'arrival_price': 185.20,
'execution_price': 185.45,
'close_price': 186.00,
'shares_ordered': 1000,
'shares_filled': 1000
}
shortfall = calculate_implementation_shortfall(order)
print("Implementation Shortfall")
print("=" * 40)
for component, value in shortfall.items():
print(f"{component}: {value:.2f}")
Exercise 14.5: Slippage Calculator (Open-ended)
Your Task:
Build a function that calculates slippage for a list of trades: - For buys: slippage = (actual - expected) / expected - For sells: slippage = (expected - actual) / expected - Return total slippage in dollars and weighted average in basis points
Your implementation:
Click to reveal solution
def calculate_slippage(trades: list) -> dict:
"""
Calculate slippage for a list of trades.
Parameters:
-----------
trades : list of dict
Each has: expected_price, actual_price, shares, side
"""
total_expected = 0
total_slippage_dollars = 0
for trade in trades:
expected = trade['expected_price']
actual = trade['actual_price']
shares = trade['shares']
side = trade['side']
trade_value = expected * shares
total_expected += trade_value
if side == 'buy':
slippage = (actual - expected) * shares
else:
slippage = (expected - actual) * shares
total_slippage_dollars += slippage
weighted_slippage_bps = (total_slippage_dollars / total_expected) * 10000
return {
'total_slippage_dollars': total_slippage_dollars,
'weighted_slippage_bps': weighted_slippage_bps,
'total_trade_value': total_expected
}
# Test
trades = [
{'expected_price': 100.00, 'actual_price': 100.05, 'shares': 500, 'side': 'buy'},
{'expected_price': 150.00, 'actual_price': 150.10, 'shares': 300, 'side': 'buy'},
{'expected_price': 75.00, 'actual_price': 74.90, 'shares': 800, 'side': 'sell'},
]
result = calculate_slippage(trades)
print(f"Total Slippage: ${result['total_slippage_dollars']:.2f}")
print(f"Weighted Slippage: {result['weighted_slippage_bps']:.2f} bps")
Exercise 14.6: Complete Rebalancing Engine (Open-ended)
Your Task:
Build a RebalancingEngine class that:
- Takes target weights and configuration options in the constructor
- Has an analyze_portfolio() method that returns current weights, drift, and whether rebalancing is needed
- Has a generate_trades() method that creates a trade list to reach targets
- Has an execute_rebalance() method that simulates or executes the rebalance
Your implementation:
Click to reveal solution
class RebalancingEngine:
"""
Production-ready portfolio rebalancing engine.
"""
def __init__(self, target_weights: dict, config: dict = None):
self.target_weights = target_weights
self.config = config or {
'rebalance_threshold': 0.05,
'trade_threshold': 0.02,
'transaction_cost_bps': 10
}
self.assets = list(target_weights.keys())
def analyze_portfolio(self, holdings: dict, prices: dict) -> dict:
"""Analyze current portfolio state."""
values = {s: holdings[s] * prices[s] for s in holdings}
total_value = sum(values.values())
current_weights = {s: v/total_value for s, v in values.items()}
drift = {s: current_weights.get(s, 0) - self.target_weights[s]
for s in self.target_weights}
max_drift = max(abs(d) for d in drift.values())
return {
'total_value': total_value,
'current_weights': current_weights,
'drift': drift,
'max_drift': max_drift,
'needs_rebalance': max_drift > self.config['rebalance_threshold']
}
def generate_trades(self, holdings: dict, prices: dict) -> list:
"""Generate trade list."""
analysis = self.analyze_portfolio(holdings, prices)
trades = []
for symbol in self.target_weights:
drift = analysis['drift'].get(symbol, 0)
if abs(drift) < self.config['trade_threshold']:
continue
target_value = self.target_weights[symbol] * analysis['total_value']
current_value = analysis['current_weights'].get(symbol, 0) * analysis['total_value']
trade_value = target_value - current_value
shares = int(trade_value / prices[symbol])
if shares != 0:
trades.append({
'symbol': symbol,
'shares': shares,
'side': 'buy' if shares > 0 else 'sell',
'value': abs(shares * prices[symbol])
})
return sorted(trades, key=lambda x: (x['side'] == 'buy', -x['value']))
def execute_rebalance(self, holdings: dict, prices: dict, dry_run: bool = True) -> dict:
"""Execute rebalance."""
analysis = self.analyze_portfolio(holdings, prices)
trades = self.generate_trades(holdings, prices)
total_turnover = sum(t['value'] for t in trades)
total_cost = total_turnover * self.config['transaction_cost_bps'] / 10000
return {
'portfolio_value': analysis['total_value'],
'max_drift_before': analysis['max_drift'],
'num_trades': len(trades),
'total_turnover': total_turnover,
'estimated_cost': total_cost,
'trades': trades,
'dry_run': dry_run
}
# Test
engine = RebalancingEngine(target_weights)
holdings = {'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80}
prices = {'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}
result = engine.execute_rebalance(holdings, prices, dry_run=True)
print(f"Portfolio Value: ${result['portfolio_value']:,.0f}")
print(f"Max Drift: {result['max_drift_before']:.1%}")
print(f"Trades: {result['num_trades']}")
print(f"Estimated Cost: ${result['estimated_cost']:.2f}")
Module Project: Production Rebalancing System
Build a comprehensive rebalancing system that combines all concepts.
Your Challenge:
Build a ProductionRebalancer class that includes:
1. Multiple rebalancing strategies (calendar, threshold, hybrid)
2. Transaction cost modeling
3. Tax-loss harvesting integration
4. Execution quality tracking
# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionRebalancer:
"""
Production-ready rebalancing system.
"""
def __init__(self, target_weights: dict, config: dict = None):
self.target_weights = target_weights
self.config = config or {
'strategy': 'hybrid',
'check_frequency': 'M',
'trigger_threshold': 0.05,
'trade_threshold': 0.02,
'transaction_cost_bps': 10,
'enable_tax_harvesting': True,
'min_harvest_loss_pct': 0.05,
'tax_rate': 0.30
}
self.assets = list(target_weights.keys())
self.rebalance_history = []
def analyze_portfolio(self, holdings: dict, prices: dict,
cost_basis: dict = None) -> dict:
"""Comprehensive portfolio analysis."""
values = {s: holdings.get(s, 0) * prices.get(s, 0) for s in self.assets}
total_value = sum(values.values())
if total_value == 0:
return {'total_value': 0, 'needs_rebalance': False}
current_weights = {s: v/total_value for s, v in values.items()}
drift = {s: current_weights[s] - self.target_weights[s] for s in self.assets}
max_drift = max(abs(d) for d in drift.values())
# Tax-loss harvesting opportunities
harvest_opportunities = []
if self.config['enable_tax_harvesting'] and cost_basis:
for symbol in self.assets:
if symbol in cost_basis and symbol in holdings:
current_value = values[symbol]
basis = cost_basis[symbol] * holdings[symbol]
gain_loss_pct = (current_value - basis) / basis if basis > 0 else 0
if gain_loss_pct <= -self.config['min_harvest_loss_pct']:
loss = current_value - basis
harvest_opportunities.append({
'symbol': symbol,
'loss': loss,
'tax_benefit': abs(loss) * self.config['tax_rate']
})
return {
'total_value': total_value,
'current_weights': current_weights,
'drift': drift,
'max_drift': max_drift,
'needs_rebalance': max_drift > self.config['trigger_threshold'],
'harvest_opportunities': harvest_opportunities
}
def generate_trades(self, holdings: dict, prices: dict,
analysis: dict = None) -> list:
"""Generate optimal trade list."""
if analysis is None:
analysis = self.analyze_portfolio(holdings, prices)
trades = []
total_value = analysis['total_value']
for symbol in self.assets:
drift = analysis['drift'].get(symbol, 0)
if abs(drift) < self.config['trade_threshold']:
continue
target_value = self.target_weights[symbol] * total_value
current_value = analysis['current_weights'].get(symbol, 0) * total_value
trade_value = target_value - current_value
shares = int(trade_value / prices[symbol]) if prices[symbol] > 0 else 0
if shares != 0:
cost_bps = self.config['transaction_cost_bps']
est_cost = abs(shares * prices[symbol]) * cost_bps / 10000
trades.append({
'symbol': symbol,
'shares': shares,
'side': 'buy' if shares > 0 else 'sell',
'price': prices[symbol],
'value': abs(shares * prices[symbol]),
'estimated_cost': est_cost
})
return sorted(trades, key=lambda x: (x['side'] == 'buy', -x['value']))
def execute_rebalance(self, holdings: dict, prices: dict,
cost_basis: dict = None, dry_run: bool = True) -> dict:
"""Execute a rebalance operation."""
analysis = self.analyze_portfolio(holdings, prices, cost_basis)
trades = self.generate_trades(holdings, prices, analysis)
total_turnover = sum(t['value'] for t in trades)
total_cost = sum(t['estimated_cost'] for t in trades)
total_harvest = sum(o['tax_benefit'] for o in analysis.get('harvest_opportunities', []))
summary = {
'timestamp': datetime.now(),
'portfolio_value': analysis['total_value'],
'max_drift_before': analysis['max_drift'],
'num_trades': len(trades),
'total_turnover': total_turnover,
'turnover_pct': total_turnover / analysis['total_value'] if analysis['total_value'] > 0 else 0,
'estimated_cost': total_cost,
'cost_bps': total_cost / analysis['total_value'] * 10000 if analysis['total_value'] > 0 else 0,
'tax_harvest_benefit': total_harvest,
'trades': trades,
'dry_run': dry_run
}
if not dry_run:
self.rebalance_history.append(summary)
return summary
def print_report(self, summary: dict):
"""Print formatted rebalance report."""
print("=" * 50)
print("REBALANCE REPORT")
print("=" * 50)
print(f"Status: {'DRY RUN' if summary['dry_run'] else 'EXECUTED'}")
print(f"Portfolio Value: ${summary['portfolio_value']:,.0f}")
print(f"Max Drift: {summary['max_drift_before']:.1%}")
print(f"\nTrades: {summary['num_trades']}")
print(f"Turnover: ${summary['total_turnover']:,.0f} ({summary['turnover_pct']:.1%})")
print(f"Est. Cost: ${summary['estimated_cost']:.2f} ({summary['cost_bps']:.1f} bps)")
if summary['tax_harvest_benefit'] > 0:
print(f"Tax Benefit: ${summary['tax_harvest_benefit']:,.0f}")
print("=" * 50)
# Demo
rebalancer = ProductionRebalancer(target_weights)
holdings = {'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80}
prices = {'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}
cost_basis = {'SPY': 400, 'AGG': 105, 'GLD': 190, 'VNQ': 95}
result = rebalancer.execute_rebalance(holdings, prices, cost_basis, dry_run=True)
rebalancer.print_report(result)
# Solution - Module Project (simplified demo)
class ProductionRebalancer:
def __init__(self, target_weights: dict):
self.target_weights = target_weights
self.assets = list(target_weights.keys())
def analyze_and_rebalance(self, holdings: dict, prices: dict) -> dict:
values = {s: holdings.get(s, 0) * prices.get(s, 0) for s in self.assets}
total = sum(values.values())
weights = {s: v/total for s, v in values.items()}
drift = {s: abs(weights[s] - self.target_weights[s]) for s in self.assets}
return {
'total_value': total,
'weights': weights,
'max_drift': max(drift.values()),
'needs_rebalance': max(drift.values()) > 0.05
}
# Demo
rebalancer = ProductionRebalancer(target_weights)
result = rebalancer.analyze_and_rebalance(
{'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80},
{'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}
)
print(f"Portfolio Value: ${result['total_value']:,.0f}")
print(f"Max Drift: {result['max_drift']:.1%}")
print(f"Needs Rebalance: {result['needs_rebalance']}")
Key Takeaways
What You Learned
1. Rebalancing Strategies
- Calendar rebalancing: simple but may over/under-trade
- Threshold rebalancing: trades only when drift exceeds limit
- Hybrid approaches combine benefits of both
2. Transaction Costs
- Include explicit (commissions) and implicit (spread, impact) costs
- Market impact grows with trade size (square-root model)
- Break-even analysis helps determine minimum holding periods
3. Tax-Loss Harvesting
- Can add 0.5-1% annually to after-tax returns
- Must avoid wash sale rule (30-day window)
- Use correlated substitutes to maintain exposure
4. Implementation Shortfall
- Measures total cost of trading decisions
- Components: delay, trading, opportunity costs
- VWAP/TWAP algorithms help minimize impact
Coming Up Next
In Module 15: Market Microstructure, we'll explore: - Order books and price formation - Bid-ask spread dynamics - Market maker behavior - Optimal execution strategies
Congratulations on completing Module 14!
Module 15: Market Microstructure
Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure
Learning Objectives
By the end of this module, you will be able to:
- Understand limit order book mechanics and price-time priority
- Analyze bid-ask spread components and estimate spreads
- Model price impact using square-root and Almgren-Chriss models
- Implement optimal execution algorithms (TWAP, VWAP, IS)
| Attribute | Value |
|---|---|
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 14 (Rebalancing & Execution) |
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict
from datetime import datetime, timedelta
from dataclasses import dataclass
from enum import Enum
from typing import List, Dict, Optional, Tuple
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
print("Module 15: Market Microstructure")
print("=" * 40)
Section 15.1: Order Book Mechanics
Modern electronic markets are organized around the limit order book (LOB) - a collection of buy and sell orders at various prices.
In this section, you will learn: - Order types (market, limit, stop) - Price-time priority matching - Book imbalance as a directional signal
Order Types
| Order Type | Description | Execution |
|---|---|---|
| Market | Execute immediately at best available price | Certain execution, uncertain price |
| Limit | Execute only at specified price or better | Uncertain execution, certain price |
| Stop | Becomes market order when price reaches trigger | Risk management |
Order Book Structure
SELL SIDE (Asks) | BUY SIDE (Bids)
Price Quantity | Price Quantity
$100.10 500 <-- Best Ask | $100.05 800 <-- Best Bid
$100.15 300 | $100.00 1200
$100.20 1000 | $99.95 400
The spread is the gap between best bid and best ask.
class OrderSide(Enum):
BUY = "buy"
SELL = "sell"
class OrderType(Enum):
MARKET = "market"
LIMIT = "limit"
@dataclass
class Order:
"""Represents a single order."""
order_id: int
side: OrderSide
order_type: OrderType
price: Optional[float]
quantity: int
timestamp: datetime
@dataclass
class Trade:
"""Represents an executed trade."""
trade_id: int
price: float
quantity: int
aggressor_side: OrderSide
timestamp: datetime
class LimitOrderBook:
"""
A simple limit order book implementation.
Supports:
- Adding limit orders
- Market orders (immediate execution)
- Order cancellation
- Price-time priority matching
"""
def __init__(self, tick_size: float = 0.01):
self.tick_size = tick_size
self.bids = defaultdict(list)
self.asks = defaultdict(list)
self.orders = {}
self.trades = []
self._order_counter = 0
self._trade_counter = 0
def _round_price(self, price: float) -> float:
return round(price / self.tick_size) * self.tick_size
def best_bid(self) -> Optional[float]:
if not self.bids:
return None
return max(p for p in self.bids if self.bids[p])
def best_ask(self) -> Optional[float]:
if not self.asks:
return None
return min(p for p in self.asks if self.asks[p])
def spread(self) -> Optional[float]:
bid, ask = self.best_bid(), self.best_ask()
if bid is None or ask is None:
return None
return ask - bid
def midpoint(self) -> Optional[float]:
bid, ask = self.best_bid(), self.best_ask()
if bid is None or ask is None:
return None
return (bid + ask) / 2
def add_order(self, side: OrderSide, order_type: OrderType,
quantity: int, price: Optional[float] = None) -> Tuple[Order, List[Trade]]:
self._order_counter += 1
timestamp = datetime.now()
if price is not None:
price = self._round_price(price)
order = Order(
order_id=self._order_counter,
side=side,
order_type=order_type,
price=price,
quantity=quantity,
timestamp=timestamp
)
trades = []
if order_type == OrderType.MARKET:
trades = self._execute_market_order(order)
else:
trades = self._match_order(order)
if order.quantity > 0:
self.orders[order.order_id] = order
if side == OrderSide.BUY:
self.bids[price].append(order)
else:
self.asks[price].append(order)
return order, trades
def _execute_market_order(self, order: Order) -> List[Trade]:
trades = []
remaining = order.quantity
if order.side == OrderSide.BUY:
book_side = self.asks
price_order = sorted
else:
book_side = self.bids
price_order = lambda x: sorted(x, reverse=True)
for price in list(price_order(book_side.keys())):
if remaining <= 0:
break
orders_at_price = book_side[price]
while orders_at_price and remaining > 0:
resting_order = orders_at_price[0]
fill_qty = min(remaining, resting_order.quantity)
self._trade_counter += 1
trade = Trade(
trade_id=self._trade_counter,
price=resting_order.price,
quantity=fill_qty,
aggressor_side=order.side,
timestamp=datetime.now()
)
trades.append(trade)
self.trades.append(trade)
remaining -= fill_qty
resting_order.quantity -= fill_qty
if resting_order.quantity == 0:
orders_at_price.pop(0)
del self.orders[resting_order.order_id]
order.quantity = remaining
return trades
def _match_order(self, order: Order) -> List[Trade]:
trades = []
if order.side == OrderSide.BUY:
while order.quantity > 0 and self.asks:
best_ask_price = self.best_ask()
if best_ask_price is None or order.price < best_ask_price:
break
orders_at_price = self.asks[best_ask_price]
if not orders_at_price:
break
resting_order = orders_at_price[0]
fill_qty = min(order.quantity, resting_order.quantity)
self._trade_counter += 1
trade = Trade(
trade_id=self._trade_counter,
price=resting_order.price,
quantity=fill_qty,
aggressor_side=order.side,
timestamp=datetime.now()
)
trades.append(trade)
self.trades.append(trade)
order.quantity -= fill_qty
resting_order.quantity -= fill_qty
if resting_order.quantity == 0:
orders_at_price.pop(0)
del self.orders[resting_order.order_id]
else:
while order.quantity > 0 and self.bids:
best_bid_price = self.best_bid()
if best_bid_price is None or order.price > best_bid_price:
break
orders_at_price = self.bids[best_bid_price]
if not orders_at_price:
break
resting_order = orders_at_price[0]
fill_qty = min(order.quantity, resting_order.quantity)
self._trade_counter += 1
trade = Trade(
trade_id=self._trade_counter,
price=resting_order.price,
quantity=fill_qty,
aggressor_side=order.side,
timestamp=datetime.now()
)
trades.append(trade)
self.trades.append(trade)
order.quantity -= fill_qty
resting_order.quantity -= fill_qty
if resting_order.quantity == 0:
orders_at_price.pop(0)
del self.orders[resting_order.order_id]
return trades
def cancel_order(self, order_id: int) -> bool:
if order_id not in self.orders:
return False
order = self.orders[order_id]
if order.side == OrderSide.BUY:
self.bids[order.price].remove(order)
else:
self.asks[order.price].remove(order)
del self.orders[order_id]
return True
def get_book_state(self, levels: int = 5) -> Dict:
bid_prices = sorted([p for p in self.bids if self.bids[p]], reverse=True)[:levels]
ask_prices = sorted([p for p in self.asks if self.asks[p]])[:levels]
bids = [(p, sum(o.quantity for o in self.bids[p])) for p in bid_prices]
asks = [(p, sum(o.quantity for o in self.asks[p])) for p in ask_prices]
return {
'bids': bids,
'asks': asks,
'best_bid': self.best_bid(),
'best_ask': self.best_ask(),
'spread': self.spread(),
'midpoint': self.midpoint()
}
def display(self):
state = self.get_book_state()
print("\n" + "="*50)
print("ORDER BOOK")
print("="*50)
print(f"Spread: ${state['spread']:.2f}" if state['spread'] else "Spread: N/A")
print(f"Midpoint: ${state['midpoint']:.2f}" if state['midpoint'] else "Midpoint: N/A")
print("-"*50)
print(f"{'ASK':^25} | {'BID':^22}")
print(f"{'Price':>12} {'Qty':>10} | {'Price':>10} {'Qty':>10}")
print("-"*50)
asks = state['asks'][::-1]
bids = state['bids']
max_rows = max(len(asks), len(bids))
for i in range(max_rows):
ask_str = f"${asks[i][0]:>10.2f} {asks[i][1]:>10,}" if i < len(asks) else " "*23
bid_str = f"${bids[i][0]:>9.2f} {bids[i][1]:>10,}" if i < len(bids) else " "*22
print(f"{ask_str} | {bid_str}")
print("="*50)
# Create and populate an order book
book = LimitOrderBook(tick_size=0.01)
# Add buy orders (bids)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 500, 100.00)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 800, 99.95)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 1200, 99.90)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 300, 99.85)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 600, 99.80)
# Add sell orders (asks)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 400, 100.05)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 700, 100.10)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 1000, 100.15)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 500, 100.20)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 900, 100.25)
print("Initial Order Book State:")
book.display()
# Simulate market order execution
print("Submitting: BUY 600 shares at MARKET")
order, trades = book.add_order(OrderSide.BUY, OrderType.MARKET, 600)
print(f"\nExecuted {len(trades)} trade(s):")
for t in trades:
print(f" {t.quantity} shares @ ${t.price:.2f}")
avg_price = sum(t.price * t.quantity for t in trades) / sum(t.quantity for t in trades)
print(f"\nAverage execution price: ${avg_price:.2f}")
print(f"Midpoint was: ${(100.00 + 100.05)/2:.2f}")
print(f"Slippage: ${avg_price - 100.025:.4f}")
print("\nOrder Book After Market Buy:")
book.display()
Exercise 15.1: Book Imbalance Calculator (Guided)
Your Task: Calculate order book imbalance - the ratio of bid volume to ask volume.
Imbalance = (Bid Volume - Ask Volume) / (Bid Volume + Ask Volume)
Returns value between -1 (all asks) and +1 (all bids).
Fill in the blanks to complete the function:
Click to reveal solution
def calculate_book_imbalance(book: LimitOrderBook, levels: int = 3) -> float:
"""Calculate order book imbalance from top levels."""
state = book.get_book_state(levels=levels)
bid_volume = sum(qty for _, qty in state['bids'])
ask_volume = sum(qty for _, qty in state['asks'])
total_volume = bid_volume + ask_volume
if total_volume == 0:
return 0
imbalance = (bid_volume - ask_volume) / total_volume
return imbalance
# Test
imbalance = calculate_book_imbalance(book, levels=3)
print(f"Book Imbalance (top 3 levels): {imbalance:.2%}")
if imbalance > 0.1:
print("Interpretation: More bid volume - bullish pressure")
elif imbalance < -0.1:
print("Interpretation: More ask volume - bearish pressure")
else:
print("Interpretation: Balanced book")
Section 15.2: Bid-Ask Spread Analysis
The bid-ask spread is the most fundamental transaction cost. Understanding its components helps predict trading costs.
In this section, you will learn: - Spread components (order processing, inventory, adverse selection) - Types of spreads (quoted, effective, realized) - Roll model for spread estimation
Spread Components
The spread compensates market makers for:
- Order Processing Costs - Fixed costs of maintaining systems
- Inventory Risk - Risk of holding inventory that may lose value
- Adverse Selection - Risk of trading with informed traders
Types of Spreads
- Quoted Spread: Best ask - Best bid
- Effective Spread: 2 × |Trade price - Midpoint|
- Realized Spread: Effective spread minus subsequent price change
class SpreadAnalyzer:
"""Analyzes bid-ask spread characteristics."""
def __init__(self):
self.quotes = []
self.trades = []
def add_quote(self, timestamp: datetime, bid: float, ask: float):
"""Record a quote update."""
self.quotes.append({
'timestamp': timestamp,
'bid': bid,
'ask': ask,
'midpoint': (bid + ask) / 2,
'quoted_spread': ask - bid,
'quoted_spread_bps': (ask - bid) / ((bid + ask) / 2) * 10000
})
def add_trade(self, timestamp: datetime, price: float, side: str):
"""Record a trade."""
quote_at_trade = None
for q in reversed(self.quotes):
if q['timestamp'] <= timestamp:
quote_at_trade = q
break
if quote_at_trade:
midpoint = quote_at_trade['midpoint']
effective_half_spread = abs(price - midpoint)
effective_spread = 2 * effective_half_spread
self.trades.append({
'timestamp': timestamp,
'price': price,
'side': side,
'midpoint_at_trade': midpoint,
'effective_spread': effective_spread,
'effective_spread_bps': effective_spread / midpoint * 10000
})
def summary_stats(self) -> Dict:
"""Get summary statistics."""
if not self.quotes:
return None
quoted_spreads = [q['quoted_spread_bps'] for q in self.quotes]
stats = {
'num_quotes': len(self.quotes),
'avg_quoted_spread_bps': np.mean(quoted_spreads),
'median_quoted_spread_bps': np.median(quoted_spreads),
'min_quoted_spread_bps': np.min(quoted_spreads),
'max_quoted_spread_bps': np.max(quoted_spreads),
}
if self.trades:
effective_spreads = [t['effective_spread_bps'] for t in self.trades]
stats['num_trades'] = len(self.trades)
stats['avg_effective_spread_bps'] = np.mean(effective_spreads)
return stats
# Simulate quote and trade data
np.random.seed(42)
analyzer = SpreadAnalyzer()
base_time = datetime.now()
base_mid = 100.0
base_spread = 0.05
for i in range(100):
timestamp = base_time + timedelta(seconds=i*10)
base_mid += np.random.normal(0, 0.02)
spread = max(0.01, base_spread + np.random.normal(0, 0.01))
bid = base_mid - spread/2
ask = base_mid + spread/2
analyzer.add_quote(timestamp, bid, ask)
if np.random.random() < 0.3:
side = 'buy' if np.random.random() < 0.5 else 'sell'
price = ask + np.random.uniform(0, 0.01) if side == 'buy' else bid - np.random.uniform(0, 0.01)
analyzer.add_trade(timestamp, price, side)
stats = analyzer.summary_stats()
print("Spread Analysis Summary")
print("=" * 40)
for key, value in stats.items():
if 'bps' in key:
print(f"{key}: {value:.2f}")
else:
print(f"{key}: {value}")
# Visualize spread over time
df_quotes = pd.DataFrame(analyzer.quotes)
fig, axes = plt.subplots(2, 1, figsize=(12, 6), sharex=True)
axes[0].fill_between(df_quotes['timestamp'], df_quotes['bid'], df_quotes['ask'],
alpha=0.3, label='Bid-Ask Range')
axes[0].plot(df_quotes['timestamp'], df_quotes['midpoint'],
label='Midpoint', color='blue', linewidth=1)
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].set_title('Quote Evolution')
axes[1].plot(df_quotes['timestamp'], df_quotes['quoted_spread_bps'], color='red', linewidth=1)
axes[1].axhline(df_quotes['quoted_spread_bps'].mean(), color='red',
linestyle='--', label=f"Mean: {df_quotes['quoted_spread_bps'].mean():.1f} bps")
axes[1].set_ylabel('Spread (bps)')
axes[1].set_xlabel('Time')
axes[1].legend()
axes[1].set_title('Quoted Spread Over Time')
plt.tight_layout()
plt.show()
Exercise 15.2: Roll Spread Estimator (Guided)
Your Task: Estimate the bid-ask spread using the Roll (1984) model.
The Roll model estimates spread from the autocovariance of price changes:
$$\text{Spread} = 2\sqrt{-\text{Cov}(\Delta P_t, \Delta P_{t-1})}$$
(Only valid when covariance is negative)
Fill in the blanks:
Click to reveal solution
def estimate_spread_roll(prices: np.ndarray) -> Optional[float]:
"""Estimate bid-ask spread using Roll (1984) model."""
prices = np.array(prices)
# Calculate price changes
delta_p = np.diff(prices)
# Calculate autocovariance
cov = np.cov(delta_p[1:], delta_p[:-1])[0, 1]
# Roll model only applies when covariance is negative
if cov >= 0:
return None
# Spread = 2 * sqrt(-cov)
spread = 2 * np.sqrt(-cov)
return spread
# Test with simulated trade prices
np.random.seed(123)
true_spread = 0.05
true_mid = 100.0
# Efficient price random walk
efficient_prices = [true_mid]
for _ in range(500):
efficient_prices.append(efficient_prices[-1] + np.random.normal(0, 0.02))
# Transaction prices alternate bid/ask
transaction_prices = []
for eff_p in efficient_prices:
if np.random.random() < 0.5:
transaction_prices.append(eff_p + true_spread/2)
else:
transaction_prices.append(eff_p - true_spread/2)
estimated_spread = estimate_spread_roll(transaction_prices)
print(f"True spread: ${true_spread:.4f}")
print(f"Roll estimate: ${estimated_spread:.4f}" if estimated_spread else "Roll model not applicable")
Section 15.3: Price Impact Models
When you trade, you move the price. This price impact has two components:
- Temporary Impact: Immediate price pressure that reverses
- Permanent Impact: Information revealed by your trade
In this section, you will learn: - Square-root impact model - Almgren-Chriss framework - Impact estimation from trade data
The Square-Root Model
The most widely-used impact model:
$$\text{Impact} = \sigma \cdot \sqrt{\frac{Q}{V}}$$
Where: - $\sigma$ = daily volatility - $Q$ = trade quantity - $V$ = average daily volume
class PriceImpactModel:
"""Models price impact of trading."""
def __init__(self, sigma: float = 0.02, avg_daily_volume: int = 1000000):
self.sigma = sigma
self.adv = avg_daily_volume
def square_root_impact(self, quantity: int, permanent_fraction: float = 0.5) -> Dict:
"""Calculate impact using square-root model."""
participation = quantity / self.adv
total_impact = self.sigma * np.sqrt(participation)
permanent = total_impact * permanent_fraction
temporary = total_impact * (1 - permanent_fraction)
return {
'total_impact_pct': total_impact,
'permanent_pct': permanent,
'temporary_pct': temporary,
'participation_rate': participation,
'total_impact_bps': total_impact * 10000,
'permanent_bps': permanent * 10000,
'temporary_bps': temporary * 10000
}
def almgren_chriss_cost(self, quantity: int, time_horizon: float,
risk_aversion: float = 1e-6) -> Dict:
"""Calculate expected cost using Almgren-Chriss model."""
eta = 0.01 * self.sigma / np.sqrt(self.adv)
gamma = 0.1 * eta
kappa = np.sqrt(risk_aversion * self.sigma**2 / eta)
permanent_cost = 0.5 * gamma * quantity**2
temporary_cost = eta * quantity**2 / (2 * time_horizon)
timing_risk = 0.5 * risk_aversion * self.sigma**2 * quantity**2 * time_horizon
total_cost = permanent_cost + temporary_cost + timing_risk
return {
'permanent_cost': permanent_cost,
'temporary_cost': temporary_cost,
'timing_risk_cost': timing_risk,
'total_expected_cost': total_cost,
'cost_per_share': total_cost / quantity,
'optimal_kappa': kappa
}
# Example: Impact of different trade sizes
model = PriceImpactModel(sigma=0.02, avg_daily_volume=1_000_000)
print("Price Impact Analysis")
print("=" * 60)
print(f"Stock: $100, Daily Vol: 1M shares, Volatility: 2%")
print()
print(f"{'Trade Size':>12} {'% ADV':>8} {'Impact (bps)':>14} {'Perm (bps)':>12} {'Temp (bps)':>12}")
print("-" * 60)
for qty in [1000, 5000, 10000, 50000, 100000, 500000]:
impact = model.square_root_impact(qty)
print(f"{qty:>12,} {impact['participation_rate']:>7.1%} {impact['total_impact_bps']:>14.1f} "
f"{impact['permanent_bps']:>12.1f} {impact['temporary_bps']:>12.1f}")
# Visualize impact curve
quantities = np.linspace(1000, 500000, 100)
impacts = [model.square_root_impact(q)['total_impact_bps'] for q in quantities]
participation_rates = [q / model.adv * 100 for q in quantities]
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(quantities / 1000, impacts, linewidth=2)
axes[0].set_xlabel('Trade Size (thousands of shares)')
axes[0].set_ylabel('Expected Impact (bps)')
axes[0].set_title('Price Impact vs Trade Size')
axes[0].grid(True, alpha=0.3)
axes[1].plot(participation_rates, impacts, linewidth=2, color='red')
axes[1].set_xlabel('Participation Rate (% of ADV)')
axes[1].set_ylabel('Expected Impact (bps)')
axes[1].set_title('Price Impact vs Participation Rate')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("\nKey insight: Impact grows with square root of trade size")
print("Trading 4x the volume only doubles the impact")
Exercise 15.3: Optimal Trade Horizon (Guided)
Your Task: Calculate the optimal execution horizon given trade size and urgency.
The trade-off is: - Trade faster → higher market impact - Trade slower → more timing/volatility risk
Fill in the blanks:
Click to reveal solution
def optimal_trade_horizon(quantity: int, adv: int, volatility: float,
urgency: float = 1.0) -> float:
"""
Calculate optimal execution horizon.
Returns horizon in trading days.
"""
participation = quantity / adv
base_horizon = np.sqrt(participation) * 2
vol_adjustment = 0.02 / volatility
urgency_adjustment = 1 / urgency
optimal_horizon = base_horizon * vol_adjustment * urgency_adjustment
optimal_horizon = np.clip(optimal_horizon, 0.1, 5.0)
return optimal_horizon
# Test with different scenarios
print("Optimal Trade Horizons")
print("=" * 60)
scenarios = [
(50000, 1000000, 0.02, 1.0, "Base case"),
(50000, 1000000, 0.02, 2.0, "High urgency"),
(50000, 1000000, 0.04, 1.0, "High volatility"),
(200000, 1000000, 0.02, 1.0, "Large order"),
(50000, 5000000, 0.02, 1.0, "Liquid stock"),
]
for qty, adv, vol, urg, desc in scenarios:
horizon = optimal_trade_horizon(qty, adv, vol, urg)
print(f"{desc}:")
print(f" Order: {qty:,} shares ({qty/adv:.1%} of ADV)")
print(f" Optimal horizon: {horizon:.2f} days ({horizon*6.5:.1f} hours)")
print()
Section 15.4: Optimal Execution
Given price impact, how should we optimally execute large orders?
In this section, you will learn: - TWAP (Time-Weighted Average Price) - VWAP (Volume-Weighted Average Price) - Almgren-Chriss optimal trajectory - Implementation shortfall algorithms
Key Algorithms
| Algorithm | Strategy | Best For |
|---|---|---|
| TWAP | Equal slices over time | Low-urgency, uniform volume |
| VWAP | Match market volume profile | Benchmark matching |
| Implementation Shortfall | Minimize expected shortfall | Alpha-decay situations |
| Participation | Fixed % of market volume | Large orders, patient |
class ExecutionAlgorithm:
"""Implementation of common execution algorithms."""
def __init__(self, total_quantity: int, time_horizon: float, num_slices: int = 20):
self.total_quantity = total_quantity
self.time_horizon = time_horizon
self.num_slices = num_slices
self.times = np.linspace(0, time_horizon, num_slices + 1)
def twap(self) -> Dict:
"""Time-Weighted Average Price - equal amounts at regular intervals."""
slice_qty = self.total_quantity // self.num_slices
remainder = self.total_quantity % self.num_slices
quantities = [slice_qty] * self.num_slices
quantities[-1] += remainder
return {
'name': 'TWAP',
'times': self.times[1:],
'quantities': quantities,
'cumulative': np.cumsum(quantities)
}
def vwap(self, volume_profile: np.ndarray = None) -> Dict:
"""Volume-Weighted Average Price - proportional to expected volume."""
if volume_profile is None:
volume_profile = self._default_volume_profile()
if len(volume_profile) != self.num_slices:
volume_profile = np.interp(
np.linspace(0, 1, self.num_slices),
np.linspace(0, 1, len(volume_profile)),
volume_profile
)
volume_profile = np.array(volume_profile)
volume_pct = volume_profile / volume_profile.sum()
quantities = np.round(self.total_quantity * volume_pct).astype(int)
diff = self.total_quantity - quantities.sum()
quantities[-1] += diff
return {
'name': 'VWAP',
'times': self.times[1:],
'quantities': list(quantities),
'cumulative': np.cumsum(quantities)
}
def almgren_chriss(self, risk_aversion: float = 1e-6, sigma: float = 0.02) -> Dict:
"""Almgren-Chriss optimal execution trajectory."""
eta = 0.01
kappa = np.sqrt(risk_aversion * sigma**2 / eta)
T = self.time_horizon
positions = []
for t in self.times:
if t >= T:
pos = 0
else:
pos = self.total_quantity * np.sinh(kappa * (T - t)) / np.sinh(kappa * T)
positions.append(pos)
positions = np.array(positions)
quantities = -np.diff(positions)
quantities = np.round(quantities).astype(int)
diff = self.total_quantity - quantities.sum()
quantities[-1] += diff
return {
'name': 'Almgren-Chriss',
'times': self.times[1:],
'quantities': list(quantities),
'cumulative': np.cumsum(quantities),
'kappa': kappa
}
def _default_volume_profile(self) -> np.ndarray:
"""Default U-shaped intraday volume profile."""
x = np.linspace(0, 1, self.num_slices)
profile = 1 + 2 * (x - 0.5)**2
return profile
# Compare execution algorithms
algo = ExecutionAlgorithm(total_quantity=100000, time_horizon=1.0, num_slices=20)
twap = algo.twap()
vwap = algo.vwap()
ac_low = algo.almgren_chriss(risk_aversion=1e-7)
ac_high = algo.almgren_chriss(risk_aversion=1e-5)
print("Execution Algorithm Comparison")
print("=" * 50)
print(f"Order: 100,000 shares over 1 day ({algo.num_slices} slices)")
# Visualize execution trajectories
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for strategy, color, ls in [(twap, 'blue', '-'), (vwap, 'green', '-'),
(ac_low, 'red', '--'), (ac_high, 'orange', '--')]:
label = strategy['name']
if 'Almgren' in label:
label += f" (kappa={strategy['kappa']:.1f})"
axes[0].plot(strategy['times'], strategy['cumulative'],
label=label, color=color, linestyle=ls, linewidth=2)
axes[0].set_xlabel('Time (days)')
axes[0].set_ylabel('Cumulative Shares Executed')
axes[0].set_title('Execution Trajectories')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
width = 0.02
x = np.array(twap['times'])
axes[1].bar(x - 1.5*width, twap['quantities'], width, label='TWAP', alpha=0.7)
axes[1].bar(x - 0.5*width, vwap['quantities'], width, label='VWAP', alpha=0.7)
axes[1].bar(x + 0.5*width, ac_low['quantities'], width, label='AC (Patient)', alpha=0.7)
axes[1].bar(x + 1.5*width, ac_high['quantities'], width, label='AC (Urgent)', alpha=0.7)
axes[1].set_xlabel('Time (days)')
axes[1].set_ylabel('Shares per Slice')
axes[1].set_title('Execution Rate by Time')
axes[1].legend()
plt.tight_layout()
plt.show()
Exercise 15.4: Implementation Shortfall Schedule (Open-ended)
Your Task:
Build a function that generates an execution schedule to minimize implementation shortfall (alpha decay).
The function should: - Front-load execution to capture alpha before it decays - Use exponential decay weighting - Return a list of quantities for each slice - Higher alpha_decay_rate = more aggressive front-loading
Your implementation:
Click to reveal solution
def implementation_shortfall_schedule(total_quantity: int, num_slices: int,
alpha_decay_rate: float = 0.1) -> List[int]:
"""
Generate execution schedule that minimizes implementation shortfall.
Uses exponential decay weighting - trade more early when alpha
is strongest, less as alpha decays.
"""
# Exponential decay weights
times = np.arange(num_slices)
weights = np.exp(-alpha_decay_rate * times)
# Normalize
weights = weights / weights.sum()
# Allocate quantities
quantities = np.round(total_quantity * weights).astype(int)
# Adjust for rounding
diff = total_quantity - quantities.sum()
quantities[0] += diff
return list(quantities)
# Compare IS schedules with different decay rates
total_qty = 100000
num_slices = 20
schedules = {
'TWAP (baseline)': [total_qty // num_slices] * num_slices,
'IS (slow decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.05),
'IS (medium decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.15),
'IS (fast decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.30),
}
# Visualize
fig, ax = plt.subplots(figsize=(12, 5))
x = np.arange(num_slices)
width = 0.2
for i, (name, schedule) in enumerate(schedules.items()):
ax.bar(x + i*width, schedule, width, label=name, alpha=0.7)
ax.set_xlabel('Time Slice')
ax.set_ylabel('Shares per Slice')
ax.set_title('Implementation Shortfall Algorithm - Front-Loading Comparison')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
print("\nFirst 5 slices for each strategy:")
for name, schedule in schedules.items():
first_five_pct = sum(schedule[:5]) / sum(schedule) * 100
print(f"{name}: {schedule[:5]} ({first_five_pct:.1f}% in first 25% of time)")
Exercise 15.5: VWAP Tracker (Open-ended)
Your Task:
Build a class that tracks VWAP execution performance in real-time.
The class should: - Track actual executions vs planned VWAP schedule - Calculate slippage vs VWAP benchmark - Provide metrics on execution quality - Handle partial fills and timing deviations
Your implementation:
Click to reveal solution
class VWAPTracker:
"""Tracks VWAP execution performance in real-time."""
def __init__(self, target_quantity: int, planned_schedule: List[int],
benchmark_vwap: float):
self.target_quantity = target_quantity
self.planned_schedule = planned_schedule
self.benchmark_vwap = benchmark_vwap
self.fills = []
self.total_executed = 0
self.total_cost = 0.0
def record_fill(self, timestamp: datetime, quantity: int, price: float):
"""Record an execution fill."""
self.fills.append({
'timestamp': timestamp,
'quantity': quantity,
'price': price,
'cost': quantity * price
})
self.total_executed += quantity
self.total_cost += quantity * price
def get_execution_stats(self) -> Dict:
"""Get execution quality statistics."""
if self.total_executed == 0:
return {'error': 'No executions recorded'}
actual_vwap = self.total_cost / self.total_executed
slippage = actual_vwap - self.benchmark_vwap
slippage_bps = slippage / self.benchmark_vwap * 10000
completion_rate = self.total_executed / self.target_quantity
return {
'actual_vwap': actual_vwap,
'benchmark_vwap': self.benchmark_vwap,
'slippage': slippage,
'slippage_bps': slippage_bps,
'total_executed': self.total_executed,
'target_quantity': self.target_quantity,
'completion_rate': completion_rate,
'num_fills': len(self.fills)
}
# Test the tracker
tracker = VWAPTracker(
target_quantity=10000,
planned_schedule=[2000, 1500, 1500, 2000, 3000],
benchmark_vwap=100.50
)
# Simulate fills with some slippage
np.random.seed(42)
base_time = datetime.now()
fills = [
(2000, 100.52),
(1500, 100.48),
(1500, 100.55),
(2000, 100.51),
(3000, 100.53)
]
for i, (qty, price) in enumerate(fills):
timestamp = base_time + timedelta(minutes=i*30)
tracker.record_fill(timestamp, qty, price)
stats = tracker.get_execution_stats()
print("VWAP Execution Report")
print("=" * 40)
print(f"Benchmark VWAP: ${stats['benchmark_vwap']:.4f}")
print(f"Actual VWAP: ${stats['actual_vwap']:.4f}")
print(f"Slippage: {stats['slippage_bps']:.2f} bps")
print(f"Completion: {stats['completion_rate']:.1%}")
Exercise 15.6: Complete Microstructure Analyzer (Open-ended)
Your Task:
Build a comprehensive MicrostructureAnalyzer class that combines all the concepts.
The class should: - Maintain an order book and spread analyzer - Simulate market activity with random orders/trades - Calculate comprehensive metrics (spreads, imbalance, impact) - Compare execution algorithms - Generate a formatted report
Your implementation:
Click to reveal solution
class MicrostructureAnalyzer:
"""Comprehensive market microstructure analysis tool."""
def __init__(self, symbol: str = 'SAMPLE', tick_size: float = 0.01):
self.symbol = symbol
self.tick_size = tick_size
self.order_book = LimitOrderBook(tick_size=tick_size)
self.spread_analyzer = SpreadAnalyzer()
self.impact_model = None
self.quote_history = []
self.trade_history = []
self.metrics = {}
def simulate_market(self, num_events: int = 1000, initial_price: float = 100.0,
volatility: float = 0.02, avg_daily_volume: int = 1000000):
"""Simulate market activity."""
np.random.seed(42)
self.impact_model = PriceImpactModel(sigma=volatility, avg_daily_volume=avg_daily_volume)
current_mid = initial_price
spread = 0.05
# Populate initial book
for i in range(5):
bid = current_mid - spread/2 - i * self.tick_size
ask = current_mid + spread/2 + i * self.tick_size
qty = np.random.randint(100, 1000)
self.order_book.add_order(OrderSide.BUY, OrderType.LIMIT, qty, bid)
self.order_book.add_order(OrderSide.SELL, OrderType.LIMIT, qty, ask)
base_time = datetime.now()
for i in range(num_events):
timestamp = base_time + timedelta(seconds=i)
current_mid += np.random.normal(0, volatility/100)
bid = self.order_book.best_bid()
ask = self.order_book.best_ask()
if bid and ask:
self.spread_analyzer.add_quote(timestamp, bid, ask)
self.quote_history.append({'timestamp': timestamp, 'bid': bid, 'ask': ask})
event = np.random.choice(['limit', 'market', 'cancel'], p=[0.6, 0.25, 0.15])
if event == 'limit':
side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
qty = np.random.randint(50, 500)
price = current_mid - np.random.uniform(0, spread) if side == OrderSide.BUY else current_mid + np.random.uniform(0, spread)
self.order_book.add_order(side, OrderType.LIMIT, qty, price)
elif event == 'market':
side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
qty = np.random.randint(50, 300)
order, trades = self.order_book.add_order(side, OrderType.MARKET, qty)
for trade in trades:
self.spread_analyzer.add_trade(timestamp, trade.price, side.value)
self.trade_history.append({'timestamp': timestamp, 'price': trade.price, 'quantity': trade.quantity, 'side': side.value})
else:
if self.order_book.orders:
order_id = np.random.choice(list(self.order_book.orders.keys()))
self.order_book.cancel_order(order_id)
print(f"Simulated {num_events} events: {len(self.quote_history)} quotes, {len(self.trade_history)} trades")
def calculate_metrics(self) -> Dict:
"""Calculate comprehensive metrics."""
spread_stats = self.spread_analyzer.summary_stats()
imbalance = calculate_book_imbalance(self.order_book, levels=3)
if self.trade_history:
df_trades = pd.DataFrame(self.trade_history)
buy_volume = df_trades[df_trades['side'] == 'buy']['quantity'].sum()
sell_volume = df_trades[df_trades['side'] == 'sell']['quantity'].sum()
order_flow_imbalance = (buy_volume - sell_volume) / (buy_volume + sell_volume) if (buy_volume + sell_volume) > 0 else 0
avg_trade_size = df_trades['quantity'].mean()
prices = df_trades['price'].values
roll_spread = estimate_spread_roll(prices)
else:
order_flow_imbalance = 0
avg_trade_size = 0
roll_spread = None
self.metrics = {
'spread_stats': spread_stats,
'book_imbalance': imbalance,
'order_flow_imbalance': order_flow_imbalance,
'avg_trade_size': avg_trade_size,
'roll_spread_estimate': roll_spread
}
return self.metrics
def analyze_execution(self, quantity: int) -> Dict:
"""Compare execution strategies."""
algo = ExecutionAlgorithm(total_quantity=quantity, time_horizon=1.0)
results = {
'TWAP': algo.twap(),
'VWAP': algo.vwap(),
'Almgren-Chriss': algo.almgren_chriss()
}
if self.impact_model:
for name, schedule in results.items():
total_impact = sum(
self.impact_model.square_root_impact(qty)['total_impact_bps'] * qty
for qty in schedule['quantities']
)
schedule['estimated_cost_bps'] = total_impact / quantity
return results
def generate_report(self) -> str:
"""Generate formatted report."""
if not self.metrics:
self.calculate_metrics()
lines = [
"=" * 60,
f"MICROSTRUCTURE ANALYSIS REPORT - {self.symbol}",
"=" * 60,
"",
"SPREAD ANALYSIS",
"-" * 40
]
ss = self.metrics.get('spread_stats', {})
if ss:
lines.append(f"Average Quoted Spread: {ss.get('avg_quoted_spread_bps', 0):.2f} bps")
if self.metrics.get('roll_spread_estimate'):
lines.append(f"Roll Spread Estimate: ${self.metrics['roll_spread_estimate']:.4f}")
lines.extend([
"",
"ORDER BOOK STATE",
"-" * 40,
f"Book Imbalance: {self.metrics.get('book_imbalance', 0):.2%}",
f"Best Bid: ${self.order_book.best_bid():.2f}" if self.order_book.best_bid() else "Best Bid: N/A",
f"Best Ask: ${self.order_book.best_ask():.2f}" if self.order_book.best_ask() else "Best Ask: N/A",
"",
"TRADING ACTIVITY",
"-" * 40,
f"Total Trades: {len(self.trade_history)}",
f"Average Trade Size: {self.metrics.get('avg_trade_size', 0):.0f} shares",
f"Order Flow Imbalance: {self.metrics.get('order_flow_imbalance', 0):.2%}",
"",
"=" * 60
])
return "\n".join(lines)
# Run the analyzer
analyzer = MicrostructureAnalyzer(symbol='DEMO')
analyzer.simulate_market(num_events=500, initial_price=100.0)
metrics = analyzer.calculate_metrics()
print(analyzer.generate_report())
# Analyze execution
print("\nExecution Analysis for 10,000 shares:")
exec_results = analyzer.analyze_execution(10000)
for name, result in exec_results.items():
cost = result.get('estimated_cost_bps', 'N/A')
print(f" {name}: {cost:.2f} bps estimated impact" if isinstance(cost, float) else f" {name}: {cost}")
Module Project: Production Microstructure System
Put together everything you've learned to build a comprehensive microstructure analysis system.
# YOUR CODE HERE - Module Project
# Build a complete microstructure analysis system that:
# 1. Simulates realistic order book activity
# 2. Calculates spread metrics (quoted, effective, Roll estimate)
# 3. Models price impact for different order sizes
# 4. Compares execution algorithms (TWAP, VWAP, AC, IS)
# 5. Generates a comprehensive analysis report
Click to reveal solution
class ProductionMicrostructureSystem:
"""
Complete microstructure analysis system for production use.
Features:
- Order book simulation and analysis
- Spread decomposition (quoted, effective, Roll)
- Price impact estimation
- Execution algorithm comparison
- Comprehensive reporting
"""
def __init__(self, symbol: str, tick_size: float = 0.01):
self.symbol = symbol
self.tick_size = tick_size
# Core components
self.order_book = LimitOrderBook(tick_size=tick_size)
self.spread_analyzer = SpreadAnalyzer()
self.impact_model = None
# Data storage
self.quote_history = []
self.trade_history = []
self.metrics = {}
self.execution_analysis = {}
def initialize_market(self, initial_price: float = 100.0,
volatility: float = 0.02,
avg_daily_volume: int = 1_000_000):
"""Initialize market parameters."""
self.initial_price = initial_price
self.volatility = volatility
self.adv = avg_daily_volume
self.impact_model = PriceImpactModel(sigma=volatility, avg_daily_volume=avg_daily_volume)
# Build initial book
spread = 0.05
for i in range(5):
bid = initial_price - spread/2 - i * self.tick_size
ask = initial_price + spread/2 + i * self.tick_size
qty = np.random.randint(200, 1000)
self.order_book.add_order(OrderSide.BUY, OrderType.LIMIT, qty, bid)
self.order_book.add_order(OrderSide.SELL, OrderType.LIMIT, qty, ask)
def simulate_trading_day(self, num_events: int = 1000):
"""Simulate a full trading day."""
np.random.seed(42)
current_mid = self.initial_price
base_time = datetime.now()
spread = 0.05
for i in range(num_events):
timestamp = base_time + timedelta(seconds=i * 23.4) # ~6.5 hours
current_mid += np.random.normal(0, self.volatility / 100)
# Record quote
bid = self.order_book.best_bid()
ask = self.order_book.best_ask()
if bid and ask:
self.spread_analyzer.add_quote(timestamp, bid, ask)
self.quote_history.append({
'timestamp': timestamp, 'bid': bid, 'ask': ask,
'midpoint': (bid + ask) / 2
})
# Random event
event = np.random.choice(['limit', 'market', 'cancel'], p=[0.6, 0.25, 0.15])
if event == 'limit':
side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
qty = np.random.randint(50, 500)
offset = np.random.uniform(0, spread)
price = current_mid - offset if side == OrderSide.BUY else current_mid + offset
self.order_book.add_order(side, OrderType.LIMIT, qty, price)
elif event == 'market':
side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
qty = np.random.randint(50, 300)
order, trades = self.order_book.add_order(side, OrderType.MARKET, qty)
for trade in trades:
self.spread_analyzer.add_trade(timestamp, trade.price, side.value)
self.trade_history.append({
'timestamp': timestamp,
'price': trade.price,
'quantity': trade.quantity,
'side': side.value
})
else:
if self.order_book.orders:
order_id = np.random.choice(list(self.order_book.orders.keys()))
self.order_book.cancel_order(order_id)
def calculate_all_metrics(self) -> Dict:
"""Calculate comprehensive metrics."""
# Spread metrics
spread_stats = self.spread_analyzer.summary_stats()
# Book imbalance
book_imbalance = calculate_book_imbalance(self.order_book, levels=3)
# Trade metrics
if self.trade_history:
df_trades = pd.DataFrame(self.trade_history)
buy_vol = df_trades[df_trades['side'] == 'buy']['quantity'].sum()
sell_vol = df_trades[df_trades['side'] == 'sell']['quantity'].sum()
total_vol = buy_vol + sell_vol
order_flow_imbalance = (buy_vol - sell_vol) / total_vol if total_vol > 0 else 0
avg_trade_size = df_trades['quantity'].mean()
# Roll spread
prices = df_trades['price'].values
roll_spread = estimate_spread_roll(prices)
else:
order_flow_imbalance = 0
avg_trade_size = 0
roll_spread = None
self.metrics = {
'spread': spread_stats,
'book_imbalance': book_imbalance,
'order_flow_imbalance': order_flow_imbalance,
'avg_trade_size': avg_trade_size,
'roll_spread': roll_spread,
'num_quotes': len(self.quote_history),
'num_trades': len(self.trade_history)
}
return self.metrics
def analyze_execution_strategies(self, order_sizes: List[int]) -> Dict:
"""Analyze execution strategies for various order sizes."""
results = {}
for size in order_sizes:
algo = ExecutionAlgorithm(total_quantity=size, time_horizon=1.0)
strategies = {
'TWAP': algo.twap(),
'VWAP': algo.vwap(),
'Almgren-Chriss': algo.almgren_chriss()
}
for name, schedule in strategies.items():
if self.impact_model:
total_impact = sum(
self.impact_model.square_root_impact(qty)['total_impact_bps'] * qty
for qty in schedule['quantities']
)
schedule['estimated_cost_bps'] = total_impact / size
results[size] = strategies
self.execution_analysis = results
return results
def generate_full_report(self) -> str:
"""Generate comprehensive analysis report."""
if not self.metrics:
self.calculate_all_metrics()
lines = [
"=" * 70,
f"MICROSTRUCTURE ANALYSIS REPORT",
f"Symbol: {self.symbol}",
f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
"=" * 70,
"",
"1. MARKET OVERVIEW",
"-" * 50,
f" Initial Price: ${self.initial_price:.2f}",
f" Volatility: {self.volatility:.1%}",
f" Avg Daily Volume: {self.adv:,}",
f" Quotes Recorded: {self.metrics['num_quotes']:,}",
f" Trades Executed: {self.metrics['num_trades']:,}",
"",
"2. SPREAD ANALYSIS",
"-" * 50
]
ss = self.metrics.get('spread', {})
if ss:
lines.extend([
f" Avg Quoted Spread: {ss.get('avg_quoted_spread_bps', 0):.2f} bps",
f" Median Quoted Spread: {ss.get('median_quoted_spread_bps', 0):.2f} bps",
f" Spread Range: {ss.get('min_quoted_spread_bps', 0):.2f} - {ss.get('max_quoted_spread_bps', 0):.2f} bps"
])
if ss and 'avg_effective_spread_bps' in ss:
lines.append(f" Avg Effective Spread: {ss['avg_effective_spread_bps']:.2f} bps")
if self.metrics.get('roll_spread'):
lines.append(f" Roll Model Estimate: ${self.metrics['roll_spread']:.4f}")
lines.extend([
"",
"3. ORDER BOOK STATE",
"-" * 50,
f" Best Bid: ${self.order_book.best_bid():.2f}" if self.order_book.best_bid() else " Best Bid: N/A",
f" Best Ask: ${self.order_book.best_ask():.2f}" if self.order_book.best_ask() else " Best Ask: N/A",
f" Current Spread: ${self.order_book.spread():.2f}" if self.order_book.spread() else " Current Spread: N/A",
f" Book Imbalance: {self.metrics['book_imbalance']:.2%}",
"",
"4. TRADING ACTIVITY",
"-" * 50,
f" Average Trade Size: {self.metrics['avg_trade_size']:.0f} shares",
f" Order Flow Imbalance: {self.metrics['order_flow_imbalance']:.2%}",
])
if self.execution_analysis:
lines.extend([
"",
"5. EXECUTION ANALYSIS",
"-" * 50
])
for size, strategies in self.execution_analysis.items():
lines.append(f"\n Order Size: {size:,} shares ({size/self.adv:.1%} of ADV)")
for name, result in strategies.items():
cost = result.get('estimated_cost_bps', 'N/A')
if isinstance(cost, float):
lines.append(f" {name}: {cost:.2f} bps")
lines.extend([
"",
"=" * 70,
"END OF REPORT",
"=" * 70
])
return "\n".join(lines)
# Run complete analysis
system = ProductionMicrostructureSystem(symbol='AAPL')
system.initialize_market(initial_price=175.0, volatility=0.025, avg_daily_volume=50_000_000)
system.simulate_trading_day(num_events=1000)
system.calculate_all_metrics()
system.analyze_execution_strategies([10000, 50000, 100000, 500000])
print(system.generate_full_report())
system.order_book.display()
Key Takeaways
What You Learned
1. Order Book Mechanics
- Limit order books match orders by price-time priority
- Market orders provide immediate execution but pay the spread
- Book imbalance can predict short-term price direction
2. Bid-Ask Spread
- Compensates market makers for inventory risk and adverse selection
- Effective spread often differs from quoted spread
- Roll model estimates spread from price autocovariance
3. Price Impact
- Grows with square root of trade size (not linearly)
- Has permanent (information) and temporary (pressure) components
- Key input for execution optimization
4. Optimal Execution
- TWAP: Simple, equal slices over time
- VWAP: Match market volume profile
- Almgren-Chriss: Optimal risk-cost tradeoff
- Implementation Shortfall: Front-load when alpha decays
Best Practices
- Understand your stock's typical spread and impact
- Size orders relative to ADV (< 10% participation is typical)
- Choose algorithm based on urgency and information
- Monitor execution quality vs benchmarks
Coming Up Next
In Module 16: High-Frequency Concepts, we'll explore: - Latency and co-location - HFT strategies and market making - Regulations and market structure
Congratulations on completing Module 15!
Module 16: High-Frequency Concepts
Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure
Learning Objectives
By the end of this module, you will be able to:
- Understand latency measurement and optimization concepts
- Calculate network latency based on distance and medium
- Analyze co-location infrastructure and ROI
- Implement basic HFT strategy simulations
| Attribute | Value |
|---|---|
| Duration | ~2 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
| Prerequisites | Module 15 (Market Microstructure) |
Important Note: This module is educational. Building actual HFT systems requires significant capital, specialized infrastructure, and regulatory compliance.
Table of Contents
Setup and Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Dict, Optional
from collections import deque
import time
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
print("Module 16: High-Frequency Concepts")
print("=" * 45)
Section 16.1: Latency and Speed
In HFT, latency is everything. A system that's 10 microseconds faster can capture opportunities others miss.
In this section, you will learn: - Types of latency in trading systems - Latency measurement techniques - Statistical analysis of latency distributions
What is Latency?
Latency is the time delay in a system. For trading, we care about:
- Market Data Latency: Time for price updates to reach your system
- Processing Latency: Time for your system to make a decision
- Order Latency: Time for your order to reach the exchange
- Round-Trip Latency: Total time from signal to execution confirmation
Latency Benchmarks
| Component | Typical Range | HFT Target |
|---|---|---|
| Network (co-located) | 1-10 μs | < 5 μs |
| Network (remote) | 1-100 ms | N/A |
| Software processing | 10-1000 μs | < 10 μs |
| Exchange matching | 5-50 μs | - |
| Round-trip (co-lo) | 20-100 μs | < 50 μs |
class LatencyMeasurement:
"""Tool for measuring and analyzing latency."""
def __init__(self, name: str):
self.name = name
self.measurements = []
self._start_time = None
def start(self):
"""Start timing."""
self._start_time = time.perf_counter_ns()
def stop(self) -> int:
"""Stop timing and record measurement."""
if self._start_time is None:
raise ValueError("Timer not started")
end_time = time.perf_counter_ns()
latency_ns = end_time - self._start_time
self.measurements.append(latency_ns)
self._start_time = None
return latency_ns
def record(self, latency_ns: int):
"""Directly record a latency measurement."""
self.measurements.append(latency_ns)
def statistics(self) -> Dict:
"""Calculate latency statistics."""
if not self.measurements:
return {}
arr = np.array(self.measurements)
return {
'name': self.name,
'count': len(arr),
'mean_ns': np.mean(arr),
'mean_us': np.mean(arr) / 1000,
'median_us': np.median(arr) / 1000,
'std_ns': np.std(arr),
'min_ns': np.min(arr),
'max_ns': np.max(arr),
'p50_ns': np.percentile(arr, 50),
'p95_ns': np.percentile(arr, 95),
'p99_ns': np.percentile(arr, 99),
'p99_9_ns': np.percentile(arr, 99.9),
}
class LatencyProfiler:
"""Profile latency across multiple components."""
def __init__(self):
self.components = {}
def add_component(self, name: str) -> LatencyMeasurement:
self.components[name] = LatencyMeasurement(name)
return self.components[name]
def get_component(self, name: str) -> LatencyMeasurement:
return self.components.get(name)
def summary(self) -> pd.DataFrame:
rows = []
for name, comp in self.components.items():
stats = comp.statistics()
if stats:
rows.append({
'Component': name,
'Count': stats['count'],
'Mean (μs)': stats['mean_us'],
'Median (μs)': stats['median_us'],
'P95 (μs)': stats['p95_ns'] / 1000,
'P99 (μs)': stats['p99_ns'] / 1000,
'Max (μs)': stats['max_ns'] / 1000,
})
return pd.DataFrame(rows)
# Demonstrate latency measurement
print("Latency Measurement Demo")
print("=" * 50)
profiler = LatencyProfiler()
# Component 1: Dictionary lookup
dict_lookup = profiler.add_component("dict_lookup")
test_dict = {str(i): i for i in range(10000)}
for _ in range(1000):
dict_lookup.start()
_ = test_dict.get("5000")
dict_lookup.stop()
# Component 2: List append
list_append = profiler.add_component("list_append")
test_list = []
for i in range(1000):
list_append.start()
test_list.append(i)
list_append.stop()
# Component 3: NumPy operation
numpy_op = profiler.add_component("numpy_mean")
arr = np.random.randn(1000)
for _ in range(1000):
numpy_op.start()
_ = np.mean(arr)
numpy_op.stop()
print(profiler.summary().to_string(index=False))
print("\nNote: These are Python operations - HFT systems use C++ for nanosecond-level operations")
# Simulate the impact of different latency levels on trading
class LatencySimulator:
"""Simulate how latency affects trading outcomes."""
def __init__(self, opportunity_duration_us: float = 100):
self.opportunity_duration = opportunity_duration_us
def simulate_arbitrage(self, latency_us: float, initial_spread_bps: float = 10,
num_opportunities: int = 1000) -> Dict:
"""Simulate arbitrage capture with given latency."""
captured = 0
missed = 0
partial = 0
total_profit_bps = 0
for _ in range(num_opportunities):
remaining_spread = initial_spread_bps * (1 - latency_us / self.opportunity_duration)
if latency_us >= self.opportunity_duration:
missed += 1
elif remaining_spread >= initial_spread_bps * 0.5:
captured += 1
total_profit_bps += remaining_spread
else:
partial += 1
total_profit_bps += max(0, remaining_spread)
return {
'latency_us': latency_us,
'opportunities': num_opportunities,
'captured': captured,
'partial': partial,
'missed': missed,
'capture_rate': captured / num_opportunities,
'total_profit_bps': total_profit_bps,
'avg_profit_bps': total_profit_bps / num_opportunities
}
# Compare different latency levels
simulator = LatencySimulator(opportunity_duration_us=100)
latencies = [10, 25, 50, 75, 100, 150, 200]
results = [simulator.simulate_arbitrage(lat, initial_spread_bps=10) for lat in latencies]
df_results = pd.DataFrame(results)
print("Impact of Latency on Arbitrage Capture")
print("=" * 60)
print(f"Opportunity duration: 100 μs, Initial spread: 10 bps\n")
print(df_results[['latency_us', 'capture_rate', 'avg_profit_bps']].to_string(index=False))
# Visualize latency impact
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(df_results['latency_us'], df_results['capture_rate'] * 100, marker='o', linewidth=2)
axes[0].fill_between(df_results['latency_us'], 0, df_results['capture_rate'] * 100, alpha=0.3)
axes[0].set_xlabel('Latency (μs)')
axes[0].set_ylabel('Capture Rate (%)')
axes[0].set_title('Arbitrage Capture Rate vs Latency')
axes[0].grid(True, alpha=0.3)
axes[1].plot(df_results['latency_us'], df_results['avg_profit_bps'], marker='s', linewidth=2, color='green')
axes[1].fill_between(df_results['latency_us'], 0, df_results['avg_profit_bps'], alpha=0.3, color='green')
axes[1].set_xlabel('Latency (μs)')
axes[1].set_ylabel('Average Profit (bps)')
axes[1].set_title('Profit per Opportunity vs Latency')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("\nKey insight: In competitive HFT, even 10 μs can mean the difference")
print("between capturing an opportunity and missing it entirely.")
Exercise 16.1: Latency Budget Calculator (Guided)
Your Task: Calculate how to allocate a total latency budget across different system components.
Components that are harder to optimize should get larger allocations.
Fill in the blanks:
Click to reveal solution
def calculate_latency_budget(total_budget_us: float, component_weights: Dict[str, float]) -> Dict:
"""
Allocate latency budget across components.
Higher weight = harder to optimize = larger allocation.
"""
total_weight = sum(component_weights.values())
allocations = {}
for component, weight in component_weights.items():
allocation = total_budget_us * (weight / total_weight)
pct_of_total = (weight / total_weight) * 100
allocations[component] = {
'budget_us': allocation,
'weight': weight,
'pct_of_total': pct_of_total
}
return allocations
# Test
components = {
'market_data_parsing': 2.0,
'strategy_logic': 1.0,
'risk_check': 0.5,
'order_construction': 0.5,
'network_io': 3.0,
}
budget = calculate_latency_budget(100, components)
print("Latency Budget Allocation (100 μs total)")
print("=" * 50)
for component, alloc in budget.items():
print(f"{component:25} {alloc['budget_us']:6.1f} μs ({alloc['pct_of_total']:4.1f}%)")
Section 16.2: Co-Location Basics
Co-location means placing your trading servers physically close to the exchange's matching engine.
In this section, you will learn: - Why physical proximity matters - Network latency calculations - Co-location infrastructure costs and ROI
Why Co-Location Matters
Light travels at approximately: - 299,792 km/s in vacuum - ~200,000 km/s in fiber optic cable
This means: - 1 km of fiber = ~5 μs latency - NY to Chicago (~1,200 km) = ~6 ms minimum - NY to London (~5,500 km) = ~27 ms minimum
class NetworkLatencyCalculator:
"""Calculate network latency based on distance and medium."""
SPEEDS = {
'vacuum': 299792,
'fiber': 200000,
'microwave': 299792,
'copper': 200000,
}
EXCHANGES = {
'NYSE': {'location': 'Mahwah, NJ', 'lat': 41.08, 'lon': -74.14},
'NASDAQ': {'location': 'Carteret, NJ', 'lat': 40.58, 'lon': -74.23},
'CME': {'location': 'Aurora, IL', 'lat': 41.76, 'lon': -88.29},
'LSE': {'location': 'Basildon, UK', 'lat': 51.57, 'lon': 0.49},
'TSE': {'location': 'Tokyo, JP', 'lat': 35.68, 'lon': 139.75},
}
@classmethod
def distance_km(cls, lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Calculate distance using Haversine formula."""
R = 6371
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
return R * c
@classmethod
def latency_one_way(cls, distance_km: float, medium: str = 'fiber',
overhead_factor: float = 1.2) -> Dict:
"""Calculate one-way latency."""
effective_distance = distance_km * overhead_factor
speed = cls.SPEEDS.get(medium, cls.SPEEDS['fiber'])
latency_seconds = effective_distance / speed
return {
'distance_km': distance_km,
'effective_distance_km': effective_distance,
'medium': medium,
'latency_ms': latency_seconds * 1000,
'latency_us': latency_seconds * 1000000,
'round_trip_ms': latency_seconds * 2000
}
@classmethod
def exchange_to_exchange(cls, exchange1: str, exchange2: str,
medium: str = 'fiber') -> Dict:
"""Calculate latency between two exchanges."""
loc1 = cls.EXCHANGES.get(exchange1)
loc2 = cls.EXCHANGES.get(exchange2)
if not loc1 or not loc2:
return None
distance = cls.distance_km(loc1['lat'], loc1['lon'], loc2['lat'], loc2['lon'])
result = cls.latency_one_way(distance, medium)
result['from'] = f"{exchange1} ({loc1['location']})"
result['to'] = f"{exchange2} ({loc2['location']})"
return result
# Calculate latencies between major exchanges
calc = NetworkLatencyCalculator()
print("Network Latency Between Major Exchanges")
print("=" * 70)
routes = [
('NYSE', 'NASDAQ'),
('NYSE', 'CME'),
('NYSE', 'LSE'),
('NYSE', 'TSE'),
('CME', 'LSE'),
]
for ex1, ex2 in routes:
result = calc.exchange_to_exchange(ex1, ex2)
print(f"{ex1} <-> {ex2}:")
print(f" Distance: {result['distance_km']:,.0f} km")
print(f" One-way (fiber): {result['latency_ms']:.2f} ms")
print(f" Round-trip: {result['round_trip_ms']:.2f} ms")
print()
# Co-location advantage visualization
distances = [0.01, 0.1, 1, 10, 100, 1000]
latencies = [calc.latency_one_way(d, 'fiber', overhead_factor=1.0)['latency_us'] for d in distances]
fig, ax = plt.subplots(figsize=(10, 5))
ax.semilogx(distances, latencies, marker='o', linewidth=2, markersize=10)
annotations = [
(0.01, "Same rack (10m)"),
(0.1, "Same data center (100m)"),
(10, "Same city (10km)"),
(1000, "Cross-country (1000km)"),
]
for dist, label in annotations:
idx = distances.index(dist)
ax.annotate(label, (dist, latencies[idx]), textcoords="offset points", xytext=(10, 10), fontsize=9)
ax.set_xlabel('Distance (km)')
ax.set_ylabel('One-way Latency (μs)')
ax.set_title('Network Latency vs Distance (Fiber Optic)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("Key distances:")
for d, l in zip(distances, latencies):
print(f" {d*1000:>8.0f} meters: {l:>10.1f} μs")
Exercise 16.2: Co-Location ROI Calculator (Guided)
Your Task: Calculate the return on investment for a co-location setup.
Compare latency advantage against competitors to estimate profit potential.
Fill in the blanks:
Click to reveal solution
def calculate_colocation_roi(setup_cost: float, monthly_cost: float,
our_latency_us: float, competitor_latency_us: float,
profit_per_us_saved: float = 500) -> Dict:
"""
Calculate ROI of co-location setup.
"""
latency_advantage = competitor_latency_us - our_latency_us
daily_profit = max(0, latency_advantage * profit_per_us_saved)
monthly_profit = daily_profit * 21
net_monthly = monthly_profit - monthly_cost
if net_monthly > 0:
payback_months = setup_cost / net_monthly
else:
payback_months = float('inf')
return {
'latency_advantage_us': latency_advantage,
'daily_profit': daily_profit,
'monthly_profit': monthly_profit,
'monthly_cost': monthly_cost,
'net_monthly': net_monthly,
'setup_cost': setup_cost,
'payback_months': payback_months
}
# Compare setups
setups = [
('Basic (no FPGA)', 50000, 20000, 15),
('Premium (with FPGA)', 150000, 35000, 2),
]
print("Co-Location ROI Analysis")
print("=" * 50)
print(f"Competitor latency: 20 μs")
print(f"Profit per μs saved: $500/day\n")
for name, setup, monthly, latency in setups:
roi = calculate_colocation_roi(setup, monthly, latency, 20, 500)
print(f"{name}:")
print(f" Setup: ${setup:,}, Monthly: ${monthly:,}")
print(f" Our latency: {latency} μs, Advantage: {roi['latency_advantage_us']} μs")
print(f" Net monthly: ${roi['net_monthly']:,.0f}")
print(f" Payback: {roi['payback_months']:.1f} months")
print()
Section 16.3: Common HFT Strategies
HFT strategies exploit speed advantages in various ways.
In this section, you will learn: - Market making mechanics - Statistical arbitrage concepts - Latency arbitrage basics
Strategy Categories
| Strategy | Description | Key Risk |
|---|---|---|
| Market Making | Provide liquidity, earn spread | Inventory risk |
| Statistical Arbitrage | Exploit price discrepancies | Model risk |
| Latency Arbitrage | Trade on info faster | Speed competition |
| Event Arbitrage | React to news quickly | Information risk |
class HFTMarketMaker:
"""Simplified HFT market making simulator."""
def __init__(self, symbol: str, inventory_limit: int = 1000,
base_spread_bps: float = 5, volatility: float = 0.02):
self.symbol = symbol
self.inventory_limit = inventory_limit
self.base_spread_bps = base_spread_bps
self.volatility = volatility
self.inventory = 0
self.cash = 0
self.trades = []
self.pnl_history = []
def calculate_quotes(self, mid_price: float, market_volatility: float = None) -> Dict:
"""Calculate bid and ask quotes with inventory skew."""
vol = market_volatility or self.volatility
spread_pct = self.base_spread_bps / 10000
spread_pct *= vol / 0.02
inventory_ratio = self.inventory / self.inventory_limit
skew = inventory_ratio * spread_pct * 0.5
half_spread = spread_pct / 2
bid = mid_price * (1 - half_spread - skew)
ask = mid_price * (1 + half_spread - skew)
return {'bid': bid, 'ask': ask, 'spread_pct': spread_pct, 'skew': skew, 'mid': mid_price}
def process_fill(self, side: str, price: float, quantity: int):
"""Process a fill."""
if side == 'buy':
self.inventory += quantity
self.cash -= price * quantity
else:
self.inventory -= quantity
self.cash += price * quantity
self.trades.append({'side': side, 'price': price, 'quantity': quantity, 'inventory': self.inventory})
def simulate_session(self, initial_price: float = 100, num_ticks: int = 1000,
fill_probability: float = 0.1) -> pd.DataFrame:
"""Simulate a trading session."""
np.random.seed(42)
price = initial_price
for tick in range(num_ticks):
price *= (1 + np.random.normal(0, self.volatility/100))
quotes = self.calculate_quotes(price)
if np.random.random() < fill_probability:
if np.random.random() < 0.5:
if self.inventory < self.inventory_limit:
qty = np.random.randint(10, 50)
self.process_fill('buy', quotes['bid'], qty)
else:
if self.inventory > -self.inventory_limit:
qty = np.random.randint(10, 50)
self.process_fill('sell', quotes['ask'], qty)
mtm_pnl = self.cash + self.inventory * price
self.pnl_history.append({'tick': tick, 'price': price, 'inventory': self.inventory,
'cash': self.cash, 'mtm_pnl': mtm_pnl})
return pd.DataFrame(self.pnl_history)
# Run market making simulation
mm = HFTMarketMaker('DEMO', inventory_limit=500, base_spread_bps=5)
results = mm.simulate_session(initial_price=100, num_ticks=1000)
print("Market Making Simulation Results")
print("=" * 50)
print(f"Total Trades: {len(mm.trades)}")
print(f"Final Inventory: {mm.inventory} shares")
print(f"Final Cash: ${mm.cash:,.2f}")
print(f"Final MTM PnL: ${results['mtm_pnl'].iloc[-1]:,.2f}")
# Visualize market making session
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
axes[0].plot(results['tick'], results['price'], linewidth=1)
axes[0].set_ylabel('Price ($)')
axes[0].set_title('Price Evolution')
axes[0].grid(True, alpha=0.3)
axes[1].plot(results['tick'], results['inventory'], linewidth=1, color='orange')
axes[1].axhline(0, color='gray', linestyle='--')
axes[1].fill_between(results['tick'], 0, results['inventory'], alpha=0.3, color='orange')
axes[1].set_ylabel('Inventory')
axes[1].set_title('Inventory Position')
axes[1].grid(True, alpha=0.3)
axes[2].plot(results['tick'], results['mtm_pnl'], linewidth=1, color='green')
axes[2].fill_between(results['tick'], 0, results['mtm_pnl'], alpha=0.3, color='green')
axes[2].set_ylabel('MTM PnL ($)')
axes[2].set_xlabel('Tick')
axes[2].set_title('Mark-to-Market PnL')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
class PairsArbitrage:
"""Simple pairs trading / stat arb strategy."""
def __init__(self, entry_zscore: float = 2.0, exit_zscore: float = 0.5, lookback: int = 100):
self.entry_zscore = entry_zscore
self.exit_zscore = exit_zscore
self.lookback = lookback
self.spread_history = deque(maxlen=lookback)
self.position = 0
def update_and_signal(self, price_a: float, price_b: float, ratio: float = 1.0) -> Dict:
"""Update spread history and generate trading signal."""
spread = price_a - ratio * price_b
self.spread_history.append(spread)
if len(self.spread_history) < self.lookback:
return {'signal': 'wait', 'zscore': None}
spread_array = np.array(self.spread_history)
mean = np.mean(spread_array)
std = np.std(spread_array)
zscore = (spread - mean) / std if std != 0 else 0
signal = 'hold'
if self.position == 0:
if zscore > self.entry_zscore:
signal = 'short_spread'
self.position = -1
elif zscore < -self.entry_zscore:
signal = 'long_spread'
self.position = 1
else:
if self.position == 1 and zscore >= -self.exit_zscore:
signal = 'close_long'
self.position = 0
elif self.position == -1 and zscore <= self.exit_zscore:
signal = 'close_short'
self.position = 0
return {'signal': signal, 'zscore': zscore, 'spread': spread, 'position': self.position}
# Simulate pairs trading
np.random.seed(123)
n_points = 500
common_factor = np.cumsum(np.random.randn(n_points) * 0.5)
noise_a = np.cumsum(np.random.randn(n_points) * 0.2)
noise_b = np.cumsum(np.random.randn(n_points) * 0.2)
price_a = 100 + common_factor + noise_a
price_b = 100 + common_factor + noise_b
pairs = PairsArbitrage(entry_zscore=2.0, exit_zscore=0.5)
signals = [pairs.update_and_signal(pa, pb) for pa, pb in zip(price_a, price_b)]
df_signals = pd.DataFrame(signals)
signal_counts = df_signals['signal'].value_counts()
print("Pairs Trading Simulation")
print("=" * 40)
print("Signal Counts:")
for sig, count in signal_counts.items():
print(f" {sig}: {count}")
Exercise 16.3: Market Maker Spread Calculator (Guided)
Your Task: Calculate optimal bid-ask spreads based on inventory and volatility.
The spread should widen when: - Inventory is high (to reduce position) - Volatility is high (to compensate for risk)
Fill in the blanks:
Click to reveal solution
def calculate_optimal_spread(mid_price: float, base_spread_bps: float,
inventory: int, inventory_limit: int,
volatility: float, base_volatility: float = 0.02) -> Dict:
"""
Calculate optimal bid-ask spread.
"""
spread_pct = base_spread_bps / 10000
vol_multiplier = volatility / base_volatility
adjusted_spread = spread_pct * vol_multiplier
inventory_ratio = inventory / inventory_limit
skew = inventory_ratio * adjusted_spread * 0.5
half_spread = adjusted_spread / 2
bid = mid_price * (1 - half_spread - skew)
ask = mid_price * (1 + half_spread - skew)
return {
'bid': bid,
'ask': ask,
'spread_bps': adjusted_spread * 10000,
'skew_bps': skew * 10000,
'vol_multiplier': vol_multiplier
}
# Test with different scenarios
print("Market Maker Spread Analysis")
print("=" * 60)
scenarios = [
("Neutral inventory, low vol", 0, 0.01),
("Long inventory, low vol", 500, 0.01),
("Short inventory, low vol", -500, 0.01),
("Neutral inventory, high vol", 0, 0.04),
("Long inventory, high vol", 500, 0.04),
]
for desc, inv, vol in scenarios:
result = calculate_optimal_spread(100, 5, inv, 1000, vol, 0.02)
print(f"{desc}:")
print(f" Bid: ${result['bid']:.4f}, Ask: ${result['ask']:.4f}")
print(f" Spread: {result['spread_bps']:.2f} bps, Skew: {result['skew_bps']:.2f} bps")
print()
Section 16.4: Regulatory Considerations
HFT operates in a heavily regulated environment.
In this section, you will learn: - Key regulations (Reg NMS, MiFID II) - Prohibited practices (spoofing, layering) - Pre-trade risk controls
Prohibited Practices
- Spoofing: Placing orders you intend to cancel
- Layering: Creating false impression of supply/demand
- Quote Stuffing: Flooding exchanges with orders to create latency
class ComplianceChecker:
"""Basic compliance checking for trading activities."""
def __init__(self, config: Dict = None):
self.config = config or {
'max_order_rate_per_second': 100,
'max_cancel_ratio': 0.9,
'max_position_size': 10000,
'max_daily_orders': 50000,
}
self.order_timestamps = deque(maxlen=1000)
self.orders_sent = 0
self.orders_canceled = 0
self.orders_filled = 0
self.violations = []
def check_order_rate(self) -> Dict:
"""Check if order rate exceeds limit."""
now = time.time()
while self.order_timestamps and (now - self.order_timestamps[0]) > 1:
self.order_timestamps.popleft()
current_rate = len(self.order_timestamps)
max_rate = self.config['max_order_rate_per_second']
if current_rate >= max_rate:
return {'passed': False, 'reason': f'Order rate {current_rate}/s exceeds limit {max_rate}/s'}
return {'passed': True}
def check_cancel_ratio(self) -> Dict:
"""Check if cancellation ratio is suspicious."""
if self.orders_sent < 100:
return {'passed': True}
cancel_ratio = self.orders_canceled / self.orders_sent
max_ratio = self.config['max_cancel_ratio']
if cancel_ratio > max_ratio:
return {'passed': False, 'reason': f'Cancel ratio {cancel_ratio:.1%} exceeds limit {max_ratio:.1%}'}
return {'passed': True}
def check_position_limit(self, current_position: int, order_qty: int) -> Dict:
"""Check if order would exceed position limits."""
max_pos = self.config['max_position_size']
resulting_position = current_position + order_qty
if abs(resulting_position) > max_pos:
return {'passed': False, 'reason': f'Position {resulting_position} exceeds limit {max_pos}'}
return {'passed': True}
def pre_order_check(self, order: Dict) -> Dict:
"""Run all pre-order compliance checks."""
checks = [
('order_rate', self.check_order_rate()),
('cancel_ratio', self.check_cancel_ratio()),
('position_limit', self.check_position_limit(
order.get('current_position', 0), order.get('quantity', 0)))
]
all_passed = all(c[1]['passed'] for c in checks)
failed_checks = [(name, result) for name, result in checks if not result['passed']]
if not all_passed:
self.violations.append({'timestamp': datetime.now(), 'order': order, 'failed_checks': failed_checks})
return {'approved': all_passed, 'checks': dict(checks), 'failed': failed_checks}
def record_order(self):
self.orders_sent += 1
self.order_timestamps.append(time.time())
def record_cancel(self):
self.orders_canceled += 1
def record_fill(self):
self.orders_filled += 1
def summary(self) -> Dict:
return {
'orders_sent': self.orders_sent,
'orders_canceled': self.orders_canceled,
'orders_filled': self.orders_filled,
'cancel_ratio': self.orders_canceled / max(1, self.orders_sent),
'fill_ratio': self.orders_filled / max(1, self.orders_sent),
'violations': len(self.violations)
}
# Demo compliance checker
checker = ComplianceChecker()
for i in range(200):
order = {
'symbol': 'TEST',
'side': 'buy',
'quantity': 100,
'current_position': i * 50 if i < 50 else 2500
}
result = checker.pre_order_check(order)
if result['approved']:
checker.record_order()
if np.random.random() < 0.85:
checker.record_cancel()
else:
checker.record_fill()
print("Compliance Summary")
print("=" * 40)
summary = checker.summary()
for key, value in summary.items():
if isinstance(value, float):
print(f"{key}: {value:.1%}")
else:
print(f"{key}: {value}")
if checker.violations:
print(f"\nWarning: {len(checker.violations)} compliance violations detected!")
Exercise 16.4: Spoofing Detector (Open-ended)
Your Task:
Build a class that detects potential spoofing behavior by analyzing order patterns.
The detector should: - Track orders and their outcomes (filled vs canceled) - Flag suspicious patterns (high cancel rate, short order lifetime) - Calculate a spoofing risk score
Your implementation:
Click to reveal solution
class SpoofingDetector:
"""Detect potential spoofing behavior."""
def __init__(self, cancel_threshold: float = 0.9, min_lifetime_ms: float = 100):
self.cancel_threshold = cancel_threshold
self.min_lifetime_ms = min_lifetime_ms
self.orders = {}
self.canceled_count = 0
self.filled_count = 0
self.short_lived_count = 0
def record_order(self, order_id: str, timestamp: datetime,
side: str, price: float, quantity: int):
"""Record a new order."""
self.orders[order_id] = {
'timestamp': timestamp,
'side': side,
'price': price,
'quantity': quantity,
'status': 'active'
}
def record_cancel(self, order_id: str, timestamp: datetime):
"""Record order cancellation."""
if order_id not in self.orders:
return
order = self.orders[order_id]
lifetime_ms = (timestamp - order['timestamp']).total_seconds() * 1000
order['status'] = 'canceled'
order['lifetime_ms'] = lifetime_ms
self.canceled_count += 1
if lifetime_ms < self.min_lifetime_ms:
self.short_lived_count += 1
def record_fill(self, order_id: str, timestamp: datetime):
"""Record order fill."""
if order_id not in self.orders:
return
order = self.orders[order_id]
lifetime_ms = (timestamp - order['timestamp']).total_seconds() * 1000
order['status'] = 'filled'
order['lifetime_ms'] = lifetime_ms
self.filled_count += 1
def calculate_risk_score(self) -> Dict:
"""Calculate spoofing risk score."""
total_orders = self.canceled_count + self.filled_count
if total_orders == 0:
return {'score': 0, 'flags': []}
cancel_ratio = self.canceled_count / total_orders
short_lived_ratio = self.short_lived_count / total_orders
flags = []
score = 0
if cancel_ratio > self.cancel_threshold:
flags.append(f'High cancel ratio: {cancel_ratio:.1%}')
score += 50
if short_lived_ratio > 0.5:
flags.append(f'Many short-lived orders: {short_lived_ratio:.1%}')
score += 30
if cancel_ratio > 0.95 and short_lived_ratio > 0.7:
flags.append('CRITICAL: Pattern consistent with spoofing')
score += 20
return {
'score': min(100, score),
'cancel_ratio': cancel_ratio,
'short_lived_ratio': short_lived_ratio,
'flags': flags,
'total_orders': total_orders
}
# Test the detector
detector = SpoofingDetector(cancel_threshold=0.9, min_lifetime_ms=100)
base_time = datetime.now()
for i in range(100):
order_time = base_time + timedelta(milliseconds=i*10)
detector.record_order(f'order_{i}', order_time, 'buy', 100.0, 100)
if np.random.random() < 0.95:
cancel_time = order_time + timedelta(milliseconds=np.random.uniform(10, 50))
detector.record_cancel(f'order_{i}', cancel_time)
else:
fill_time = order_time + timedelta(milliseconds=np.random.uniform(100, 500))
detector.record_fill(f'order_{i}', fill_time)
result = detector.calculate_risk_score()
print("Spoofing Detection Results")
print("=" * 50)
print(f"Risk Score: {result['score']}/100")
print(f"Cancel Ratio: {result['cancel_ratio']:.1%}")
for flag in result['flags']:
print(f" - {flag}")
Exercise 16.5: Latency Anomaly Detector (Open-ended)
Your Task:
Build a class that detects latency anomalies (spikes) in a trading system.
The detector should: - Track latency measurements over time - Identify anomalies using percentile thresholds - Provide statistics and anomaly counts
Your implementation:
Click to reveal solution
class LatencyAnomalyDetector:
"""Detect latency anomalies in trading systems."""
def __init__(self, window_size: int = 1000, anomaly_percentile: float = 99):
self.window_size = window_size
self.anomaly_percentile = anomaly_percentile
self.latencies = deque(maxlen=window_size)
self.timestamps = deque(maxlen=window_size)
self.anomaly_count = 0
self.total_count = 0
def record_latency(self, timestamp: datetime, latency_us: float):
"""Record a latency measurement."""
self.latencies.append(latency_us)
self.timestamps.append(timestamp)
self.total_count += 1
if self.is_anomaly(latency_us):
self.anomaly_count += 1
def is_anomaly(self, latency_us: float) -> bool:
"""Check if latency is an anomaly."""
if len(self.latencies) < 100:
return False
threshold = np.percentile(list(self.latencies), self.anomaly_percentile)
return latency_us > threshold
def get_statistics(self) -> Dict:
"""Get latency statistics."""
if not self.latencies:
return {}
arr = np.array(self.latencies)
return {
'count': len(arr),
'mean_us': np.mean(arr),
'median_us': np.median(arr),
'std_us': np.std(arr),
'min_us': np.min(arr),
'max_us': np.max(arr),
'p95_us': np.percentile(arr, 95),
'p99_us': np.percentile(arr, 99),
'anomaly_count': self.anomaly_count,
'anomaly_rate': self.anomaly_count / max(1, self.total_count)
}
# Test
detector = LatencyAnomalyDetector(window_size=1000, anomaly_percentile=99)
np.random.seed(42)
base_time = datetime.now()
for i in range(2000):
timestamp = base_time + timedelta(microseconds=i*100)
base_latency = 10 * np.random.lognormal(0, 0.3)
if np.random.random() < 0.02:
latency = base_latency * np.random.uniform(5, 20)
else:
latency = base_latency
detector.record_latency(timestamp, latency)
stats = detector.get_statistics()
print("Latency Anomaly Detection")
print("=" * 50)
print(f"Mean: {stats['mean_us']:.2f} μs")
print(f"P99: {stats['p99_us']:.2f} μs")
print(f"Anomalies: {stats['anomaly_count']} ({stats['anomaly_rate']:.1%})")
Exercise 16.6: Complete HFT System Analyzer (Open-ended)
Your Task:
Build a comprehensive HFT system analyzer that combines latency profiling, compliance checking, and performance metrics.
The analyzer should: - Profile latency across multiple components - Run compliance checks on simulated orders - Detect latency anomalies - Generate a comprehensive report
Your implementation:
Click to reveal solution
class HFTSystemAnalyzer:
"""Comprehensive HFT system analysis tool."""
def __init__(self, name: str = "HFT System"):
self.name = name
self.profiler = LatencyProfiler()
self.compliance = ComplianceChecker()
self.components = [
'market_data_recv', 'data_parsing', 'strategy_compute',
'risk_check', 'order_send', 'order_ack'
]
for comp in self.components:
self.profiler.add_component(comp)
self.tick_data = []
def simulate_tick(self, base_latencies: Dict = None) -> Dict:
"""Simulate a single tick."""
base = base_latencies or {
'market_data_recv': 5, 'data_parsing': 2, 'strategy_compute': 10,
'risk_check': 1, 'order_send': 3, 'order_ack': 5
}
tick_latencies = {}
total_latency = 0
for comp in self.components:
base_us = base.get(comp, 5)
actual_us = base_us * np.random.lognormal(0, 0.3)
actual_ns = int(actual_us * 1000)
self.profiler.get_component(comp).record(actual_ns)
tick_latencies[comp] = actual_us
total_latency += actual_us
tick_latencies['total'] = total_latency
self.tick_data.append(tick_latencies)
return tick_latencies
def run_simulation(self, num_ticks: int = 10000) -> pd.DataFrame:
"""Run full simulation."""
for _ in range(num_ticks):
self.simulate_tick()
return pd.DataFrame(self.tick_data)
def latency_breakdown(self) -> pd.DataFrame:
"""Get latency breakdown."""
breakdown = []
total_mean = 0
for comp in self.components:
stats = self.profiler.get_component(comp).statistics()
if stats:
breakdown.append({
'component': comp,
'mean_us': stats['mean_us'],
'p99_us': stats['p99_ns'] / 1000
})
total_mean += stats['mean_us']
for item in breakdown:
item['pct_of_total'] = item['mean_us'] / total_mean * 100
return pd.DataFrame(breakdown)
def generate_report(self) -> str:
"""Generate comprehensive report."""
report = [
"=" * 70,
f"HFT SYSTEM ANALYSIS REPORT: {self.name}",
"=" * 70,
"",
"COMPONENT SUMMARY",
"-" * 50,
self.profiler.summary().to_string(index=False),
"",
"LATENCY BREAKDOWN",
"-" * 50
]
breakdown = self.latency_breakdown()
total_mean = breakdown['mean_us'].sum()
total_p99 = breakdown['p99_us'].sum()
report.append(breakdown.to_string(index=False))
report.extend([
"",
f"Total Mean Latency: {total_mean:.2f} μs",
f"Total P99 Latency: {total_p99:.2f} μs",
"",
"RECOMMENDATIONS",
"-" * 50
])
bottleneck = breakdown.loc[breakdown['mean_us'].idxmax()]
report.append(f"1. Primary bottleneck: {bottleneck['component']} ({bottleneck['pct_of_total']:.1f}%)")
if total_mean > 50:
report.append("2. Consider FPGA acceleration")
if total_p99 > total_mean * 3:
report.append("3. High jitter - investigate sources")
report.extend(["", "=" * 70])
return "\n".join(report)
# Run
analyzer = HFTSystemAnalyzer("Production Trading System")
df = analyzer.run_simulation(10000)
print(analyzer.generate_report())
Module Project: Complete HFT Analysis Suite
Build a production-ready HFT analysis system combining all concepts.
# YOUR CODE HERE - Module Project
# Build a complete HFT analysis suite that:
# 1. Profiles latency across all system components
# 2. Simulates market making with inventory management
# 3. Runs compliance checks on all orders
# 4. Detects latency anomalies and spoofing patterns
# 5. Generates a comprehensive report with recommendations
Key Takeaways
What You Learned
1. Latency and Speed
- In HFT, microseconds matter - 10μs can mean profit or loss
- Measure latency at each component to identify bottlenecks
- Focus on tail latencies (P99) not just mean
2. Co-Location
- Physical proximity to exchanges dramatically reduces latency
- Speed of light limits minimum latency based on distance
- Co-location is expensive but essential for competitive HFT
3. HFT Strategies
- Market making: Provide liquidity, earn the spread
- Statistical arbitrage: Exploit price discrepancies
- Latency arbitrage: React faster than others
4. Regulatory Compliance
- HFT is heavily regulated
- Prohibited: spoofing, layering, quote stuffing
- Pre-trade risk controls are required
Reality Check
- Building HFT systems requires millions in capital
- Competition has compressed returns
- Most retail traders should focus on longer-term strategies
Coming Up Next
In Module 17: Cloud Deployment, we'll learn how to deploy trading systems in the cloud - a more accessible approach for most traders.
Congratulations on completing Module 16!
Module 17: Cloud Deployment
Part 5: Production & Infrastructure
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
Learning Objectives
By the end of this module, you will be able to: - Design cloud architectures for trading systems on AWS/GCP/Azure - Containerize trading applications with Docker - Build serverless functions for event-driven workflows - Implement CI/CD pipelines for automated deployment
# Environment setup
import os
import json
import yaml
import hashlib
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Dict, Optional
import warnings
warnings.filterwarnings('ignore')
print("Module 17: Cloud Deployment")
print("=" * 40)
print()
print("Note: This module covers cloud concepts and configuration.")
print("Actual deployment requires cloud provider accounts.")
Section 17.1: Cloud Architecture
Major Cloud Providers
| Provider | Strengths | Best For |
|---|---|---|
| AWS | Most services, market leader | General purpose, enterprise |
| GCP | Data analytics, ML, networking | Data-heavy applications |
| Azure | Microsoft integration, enterprise | Corporate environments |
Key Services for Trading Systems
| Service Type | AWS | GCP | Azure |
|---|---|---|---|
| Compute | EC2 | Compute Engine | Virtual Machines |
| Serverless | Lambda | Cloud Functions | Functions |
| Containers | ECS/EKS | GKE | AKS |
| Database | RDS/DynamoDB | Cloud SQL/Firestore | SQL Database |
| Message Queue | SQS/SNS | Pub/Sub | Service Bus |
| Storage | S3 | Cloud Storage | Blob Storage |
@dataclass
class CloudService:
"""Represents a cloud service configuration."""
name: str
provider: str # aws, gcp, azure
service_type: str
tier: str
region: str
config: Dict = field(default_factory=dict)
def estimated_monthly_cost(self) -> float:
"""Estimate monthly cost based on tier."""
# Simplified cost estimates
base_costs = {
'compute': {'small': 50, 'medium': 150, 'large': 400},
'serverless': {'small': 10, 'medium': 50, 'large': 200},
'database': {'small': 30, 'medium': 100, 'large': 500},
'storage': {'small': 5, 'medium': 25, 'large': 100},
'queue': {'small': 1, 'medium': 10, 'large': 50},
}
return base_costs.get(self.service_type, {}).get(self.tier, 0)
class CloudArchitecture:
"""
Design and plan cloud architecture for trading systems.
"""
def __init__(self, name: str, provider: str = 'aws'):
self.name = name
self.provider = provider
self.services: List[CloudService] = []
self.connections: List[tuple] = [] # (from_service, to_service)
def add_service(self, name: str, service_type: str,
tier: str = 'medium', region: str = 'us-east-1',
config: Dict = None) -> CloudService:
"""Add a service to the architecture."""
service = CloudService(
name=name,
provider=self.provider,
service_type=service_type,
tier=tier,
region=region,
config=config or {}
)
self.services.append(service)
return service
def connect(self, from_service: str, to_service: str):
"""Define a connection between services."""
self.connections.append((from_service, to_service))
def total_estimated_cost(self) -> float:
"""Calculate total estimated monthly cost."""
return sum(s.estimated_monthly_cost() for s in self.services)
def generate_terraform(self) -> str:
"""Generate basic Terraform configuration."""
tf_config = []
tf_config.append(f'# Terraform configuration for {self.name}')
tf_config.append(f'# Provider: {self.provider}')
tf_config.append('')
# Provider block
if self.provider == 'aws':
tf_config.append('provider "aws" {')
tf_config.append(' region = "us-east-1"')
tf_config.append('}')
elif self.provider == 'gcp':
tf_config.append('provider "google" {')
tf_config.append(' project = "your-project-id"')
tf_config.append(' region = "us-central1"')
tf_config.append('}')
tf_config.append('')
# Resource blocks
for service in self.services:
tf_config.append(f'# {service.name} ({service.service_type})')
resource_name = service.name.lower().replace(' ', '_').replace('-', '_')
if service.service_type == 'compute':
if self.provider == 'aws':
tf_config.append(f'resource "aws_instance" "{resource_name}" {{')
tf_config.append(f' ami = "ami-0c55b159cbfafe1f0"')
tf_config.append(f' instance_type = "{self._get_instance_type(service.tier)}"')
tf_config.append(f' tags = {{')
tf_config.append(f' Name = "{service.name}"')
tf_config.append(f' }}')
tf_config.append('}')
elif service.service_type == 'serverless':
if self.provider == 'aws':
tf_config.append(f'resource "aws_lambda_function" "{resource_name}" {{')
tf_config.append(f' function_name = "{resource_name}"')
tf_config.append(f' runtime = "python3.9"')
tf_config.append(f' handler = "handler.main"')
tf_config.append(f' memory_size = {self._get_lambda_memory(service.tier)}')
tf_config.append('}')
elif service.service_type == 'database':
if self.provider == 'aws':
tf_config.append(f'resource "aws_db_instance" "{resource_name}" {{')
tf_config.append(f' identifier = "{resource_name}"')
tf_config.append(f' engine = "postgres"')
tf_config.append(f' instance_class = "{self._get_db_instance(service.tier)}"')
tf_config.append(f' allocated_storage = 20')
tf_config.append('}')
tf_config.append('')
return '\n'.join(tf_config)
def _get_instance_type(self, tier):
"""Get AWS instance type for tier."""
mapping = {'small': 't3.micro', 'medium': 't3.medium', 'large': 't3.xlarge'}
return mapping.get(tier, 't3.medium')
def _get_lambda_memory(self, tier):
"""Get Lambda memory for tier."""
mapping = {'small': 128, 'medium': 512, 'large': 2048}
return mapping.get(tier, 512)
def _get_db_instance(self, tier):
"""Get RDS instance class for tier."""
mapping = {'small': 'db.t3.micro', 'medium': 'db.t3.medium', 'large': 'db.r5.large'}
return mapping.get(tier, 'db.t3.medium')
def display_architecture(self):
"""Display architecture summary."""
print(f"Cloud Architecture: {self.name}")
print(f"Provider: {self.provider.upper()}")
print("=" * 50)
print()
print("Services:")
for s in self.services:
cost = s.estimated_monthly_cost()
print(f" [{s.service_type.upper()}] {s.name}")
print(f" Tier: {s.tier}, Region: {s.region}")
print(f" Est. Cost: ${cost}/month")
print()
if self.connections:
print("Connections:")
for from_s, to_s in self.connections:
print(f" {from_s} -> {to_s}")
print()
print(f"Total Estimated Cost: ${self.total_estimated_cost()}/month")
# Design a trading system architecture
arch = CloudArchitecture("Quantitative Trading Platform", provider='aws')
# Add services
arch.add_service("Market Data Processor", "serverless", tier="medium")
arch.add_service("Strategy Engine", "compute", tier="medium")
arch.add_service("Order Manager", "serverless", tier="small")
arch.add_service("Trade Database", "database", tier="medium")
arch.add_service("Market Data Storage", "storage", tier="medium")
arch.add_service("Event Queue", "queue", tier="medium")
# Define connections
arch.connect("Market Data Processor", "Event Queue")
arch.connect("Event Queue", "Strategy Engine")
arch.connect("Strategy Engine", "Order Manager")
arch.connect("Strategy Engine", "Trade Database")
arch.connect("Market Data Processor", "Market Data Storage")
arch.display_architecture()
# Generate Terraform configuration
print("Generated Terraform Configuration:")
print("=" * 50)
print(arch.generate_terraform())
Exercise 17.1: Service Cost Calculator (Guided)
Create a function that calculates optimal service tiers based on budget constraints.
Click for solution
def calculate_optimal_tiers(services: Dict[str, str], budget: float) -> Dict:
"""
Calculate optimal service tiers within budget.
Args:
services: Dict of {service_name: service_type}
budget: Monthly budget in dollars
Returns:
Dict with allocations and analysis
"""
costs = {
'compute': {'small': 50, 'medium': 150, 'large': 400},
'serverless': {'small': 10, 'medium': 50, 'large': 200},
'database': {'small': 30, 'medium': 100, 'large': 500},
'storage': {'small': 5, 'medium': 25, 'large': 100},
'queue': {'small': 1, 'medium': 10, 'large': 50},
}
tiers = ['small', 'medium', 'large']
allocations = {}
total_cost = 0
num_services = len(services)
budget_per_service = budget / num_services
for name, svc_type in services.items():
type_costs = costs.get(svc_type, costs['compute'])
selected_tier = 'small'
for tier in tiers:
tier_cost = type_costs[tier]
if tier_cost <= budget_per_service:
selected_tier = tier
allocations[name] = {
'type': svc_type,
'tier': selected_tier,
'cost': type_costs[selected_tier]
}
total_cost += type_costs[selected_tier]
remaining = budget - total_cost
return {
'allocations': allocations,
'total_cost': total_cost,
'budget': budget,
'remaining': remaining,
'utilization': (total_cost / budget) * 100
}
Section 17.2: Containerization
Containers package your application with all its dependencies, ensuring it runs the same everywhere.
Why Docker?
- Consistency: "Works on my machine" becomes "works everywhere"
- Isolation: Each service runs in its own environment
- Portability: Move between cloud providers easily
- Scaling: Spin up multiple instances quickly
class DockerfileGenerator:
"""
Generate Dockerfiles for trading applications.
"""
TEMPLATES = {
'python-trading': '''
# Python Trading Application
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \\
gcc \\
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
# Expose port
EXPOSE {port}
# Run the application
CMD ["python", "{entrypoint}"]
''',
'python-api': '''
# Python API Service
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Environment
ENV PYTHONUNBUFFERED=1
EXPOSE {port}
# Use gunicorn for production
CMD ["gunicorn", "--bind", "0.0.0.0:{port}", "--workers", "4", "{entrypoint}:app"]
''',
'python-jupyter': '''
# Jupyter Notebook for Research
FROM python:3.10
WORKDIR /notebooks
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install jupyter
# Copy notebooks
COPY . .
EXPOSE 8888
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
'''
}
@classmethod
def generate(cls, template_name: str, port: int = 8000,
entrypoint: str = 'main') -> str:
"""Generate Dockerfile from template."""
template = cls.TEMPLATES.get(template_name, cls.TEMPLATES['python-trading'])
return template.format(port=port, entrypoint=entrypoint)
@classmethod
def generate_requirements(cls, packages: List[str]) -> str:
"""Generate requirements.txt content."""
return '\n'.join(packages)
class DockerComposeGenerator:
"""
Generate Docker Compose configurations for multi-service applications.
"""
def __init__(self, project_name: str):
self.project_name = project_name
self.services = {}
self.networks = ['default']
self.volumes = []
def add_service(self, name: str, image: str = None, build: str = None,
ports: List[str] = None, environment: Dict = None,
depends_on: List[str] = None, volumes: List[str] = None,
command: str = None):
"""Add a service to the compose file."""
service = {}
if image:
service['image'] = image
if build:
service['build'] = build
if ports:
service['ports'] = ports
if environment:
service['environment'] = environment
if depends_on:
service['depends_on'] = depends_on
if volumes:
service['volumes'] = volumes
if command:
service['command'] = command
service['restart'] = 'unless-stopped'
self.services[name] = service
def add_volume(self, name: str):
"""Add a named volume."""
self.volumes.append(name)
def generate(self) -> str:
"""Generate docker-compose.yml content."""
compose = {
'version': '3.8',
'services': self.services
}
if self.volumes:
compose['volumes'] = {v: {} for v in self.volumes}
return yaml.dump(compose, default_flow_style=False, sort_keys=False)
# Generate Dockerfile for a trading application
print("Dockerfile for Trading Application:")
print("=" * 50)
dockerfile = DockerfileGenerator.generate('python-trading', port=8080, entrypoint='trading_bot')
print(dockerfile)
# Generate Docker Compose for complete trading system
compose = DockerComposeGenerator("trading-system")
# Database
compose.add_service(
name='postgres',
image='postgres:15',
ports=['5432:5432'],
environment={
'POSTGRES_DB': 'trading',
'POSTGRES_USER': 'trader',
'POSTGRES_PASSWORD': '${DB_PASSWORD}'
},
volumes=['postgres_data:/var/lib/postgresql/data']
)
# Redis for caching
compose.add_service(
name='redis',
image='redis:7-alpine',
ports=['6379:6379']
)
# Market data collector
compose.add_service(
name='data-collector',
build='./data_collector',
environment={
'API_KEY': '${MARKET_API_KEY}',
'REDIS_URL': 'redis://redis:6379'
},
depends_on=['redis']
)
# Strategy engine
compose.add_service(
name='strategy-engine',
build='./strategy',
environment={
'DATABASE_URL': 'postgresql://trader:${DB_PASSWORD}@postgres:5432/trading',
'REDIS_URL': 'redis://redis:6379'
},
depends_on=['postgres', 'redis', 'data-collector']
)
# API server
compose.add_service(
name='api',
build='./api',
ports=['8000:8000'],
environment={
'DATABASE_URL': 'postgresql://trader:${DB_PASSWORD}@postgres:5432/trading',
'SECRET_KEY': '${API_SECRET_KEY}'
},
depends_on=['postgres']
)
# Dashboard
compose.add_service(
name='dashboard',
build='./dashboard',
ports=['8050:8050'],
environment={
'API_URL': 'http://api:8000'
},
depends_on=['api']
)
compose.add_volume('postgres_data')
print("Docker Compose Configuration:")
print("=" * 50)
print(compose.generate())
Exercise 17.2: Docker Service Builder (Guided)
Build a function that generates Docker Compose service configurations with proper dependencies.
Click for solution
def build_docker_services(components: List[Dict]) -> Dict:
"""
Build Docker Compose services with dependency resolution.
Args:
components: List of component specs with name, type, dependencies
Returns:
Docker Compose services dict
"""
default_ports = {
'api': 8000,
'database': 5432,
'cache': 6379,
'dashboard': 8050,
'worker': None
}
default_images = {
'database': 'postgres:15',
'cache': 'redis:7-alpine'
}
services = {}
for component in components:
name = component['name']
svc_type = component['type']
deps = component.get('dependencies', [])
service = {'restart': 'unless-stopped'}
if svc_type in default_images:
service['image'] = default_images[svc_type]
else:
service['build'] = f'./{name}'
port = default_ports.get(svc_type)
if port:
service['ports'] = [f'{port}:{port}']
if deps:
service['depends_on'] = deps
services[name] = service
return {
'version': '3.8',
'services': services
}
Section 17.3: Serverless Functions
Serverless computing lets you run code without managing servers. You pay only for execution time.
Use Cases for Trading
| Use Case | Function Type | Trigger |
|---|---|---|
| Market data processing | Data pipeline | Schedule/Event |
| Alert notifications | Notification | Event |
| Report generation | Batch | Schedule |
| API webhooks | API handler | HTTP |
| Data backup | Maintenance | Schedule |
class LambdaFunctionGenerator:
"""
Generate AWS Lambda function templates.
"""
@staticmethod
def market_data_fetcher() -> str:
"""Generate market data fetcher Lambda."""
return '''
import json
import boto3
import yfinance as yf
from datetime import datetime
s3 = boto3.client('s3')
sns = boto3.client('sns')
def handler(event, context):
"""
Fetch market data and store in S3.
Triggered by CloudWatch Events (schedule).
"""
# Configuration
symbols = event.get('symbols', ['SPY', 'QQQ', 'IWM'])
bucket = event.get('bucket', 'my-market-data-bucket')
results = []
for symbol in symbols:
try:
# Fetch data
ticker = yf.Ticker(symbol)
data = ticker.history(period='1d')
if not data.empty:
# Store in S3
date_str = datetime.now().strftime('%Y-%m-%d')
key = f"daily/{symbol}/{date_str}.json"
s3.put_object(
Bucket=bucket,
Key=key,
Body=data.to_json(),
ContentType='application/json'
)
results.append({
'symbol': symbol,
'status': 'success',
'key': key
})
else:
results.append({
'symbol': symbol,
'status': 'no_data'
})
except Exception as e:
results.append({
'symbol': symbol,
'status': 'error',
'error': str(e)
})
return {
'statusCode': 200,
'body': json.dumps({
'timestamp': datetime.now().isoformat(),
'results': results
})
}
'''
@staticmethod
def price_alert_checker() -> str:
"""Generate price alert checker Lambda."""
return '''
import json
import boto3
import yfinance as yf
from datetime import datetime
dynamodb = boto3.resource('dynamodb')
sns = boto3.client('sns')
def handler(event, context):
"""
Check price alerts and send notifications.
Triggered by CloudWatch Events (every 5 minutes during market hours).
"""
# Get active alerts from DynamoDB
table = dynamodb.Table('price_alerts')
alerts = table.scan(
FilterExpression='active = :active',
ExpressionAttributeValues={':active': True}
)['Items']
triggered_alerts = []
for alert in alerts:
symbol = alert['symbol']
target_price = float(alert['target_price'])
condition = alert['condition'] # 'above' or 'below'
topic_arn = alert['topic_arn']
# Get current price
ticker = yf.Ticker(symbol)
current_price = ticker.fast_info['lastPrice']
# Check condition
triggered = False
if condition == 'above' and current_price >= target_price:
triggered = True
elif condition == 'below' and current_price <= target_price:
triggered = True
if triggered:
# Send notification
message = f"""PRICE ALERT TRIGGERED
Symbol: {symbol}
Condition: Price {condition} ${target_price}
Current Price: ${current_price:.2f}
Time: {datetime.now().isoformat()}
"""
sns.publish(
TopicArn=topic_arn,
Message=message,
Subject=f'Price Alert: {symbol}'
)
# Deactivate alert
table.update_item(
Key={'alert_id': alert['alert_id']},
UpdateExpression='SET active = :false',
ExpressionAttributeValues={':false': False}
)
triggered_alerts.append(alert['alert_id'])
return {
'statusCode': 200,
'body': json.dumps({
'checked': len(alerts),
'triggered': triggered_alerts
})
}
'''
@staticmethod
def report_generator() -> str:
"""Generate report generator Lambda."""
return '''
import json
import boto3
import pandas as pd
from datetime import datetime, timedelta
from io import BytesIO
s3 = boto3.client('s3')
ses = boto3.client('ses')
def handler(event, context):
"""
Generate daily performance report.
Triggered by CloudWatch Events (end of day).
"""
bucket = event.get('bucket', 'my-trading-bucket')
recipient = event.get('email', 'trader@example.com')
# Load trades from S3
date_str = datetime.now().strftime('%Y-%m-%d')
trades_key = f"trades/{date_str}.json"
try:
response = s3.get_object(Bucket=bucket, Key=trades_key)
trades_df = pd.read_json(response['Body'])
except:
trades_df = pd.DataFrame()
# Generate report
if not trades_df.empty:
total_pnl = trades_df['pnl'].sum()
num_trades = len(trades_df)
win_rate = (trades_df['pnl'] > 0).mean()
report = f"""
DAILY TRADING REPORT - {date_str}
================================
Summary:
- Total Trades: {num_trades}
- Total P&L: ${total_pnl:,.2f}
- Win Rate: {win_rate:.1%}
Top Performers:
{trades_df.nlargest(3, 'pnl')[['symbol', 'pnl']].to_string()}
Worst Performers:
{trades_df.nsmallest(3, 'pnl')[['symbol', 'pnl']].to_string()}
"""
else:
report = f"DAILY REPORT - {date_str}\\n\\nNo trades executed today."
# Send email via SES
ses.send_email(
Source='reports@trading-system.com',
Destination={'ToAddresses': [recipient]},
Message={
'Subject': {'Data': f'Daily Trading Report - {date_str}'},
'Body': {'Text': {'Data': report}}
}
)
return {
'statusCode': 200,
'body': json.dumps({'report_sent': True})
}
'''
# Display Lambda function examples
print("Market Data Fetcher Lambda:")
print("=" * 50)
print(LambdaFunctionGenerator.market_data_fetcher())
Exercise 17.3: Lambda Configuration Builder (Guided)
Create a function that builds Lambda function configurations with memory and timeout settings.
Click for solution
def build_lambda_config(functions: List[Dict]) -> Dict:
"""
Build Lambda function configurations with resource allocation.
Args:
functions: List of function specs
Returns:
Lambda configuration dict
"""
memory_tiers = {
'light': 128,
'standard': 512,
'heavy': 1024,
'compute': 2048
}
timeout_presets = {
'quick': 30,
'standard': 60,
'long': 300,
'max': 900
}
configs = {}
total_memory = 0
for func in functions:
name = func['name']
mem_tier = func.get('memory', 'standard')
timeout_type = func.get('timeout', 'standard')
memory = memory_tiers.get(mem_tier, 512)
timeout = timeout_presets[timeout_type]
config = {
'function_name': name,
'runtime': 'python3.10',
'handler': f'{name}.handler',
'memory_size': memory,
'timeout': timeout,
'environment': func.get('env', {})
}
if 'schedule' in func:
config['trigger'] = {
'type': 'schedule',
'expression': func['schedule']
}
configs[name] = config
total_memory += memory
return {
'functions': configs,
'count': len(configs),
'total_memory_mb': total_memory
}
# Generate CloudFormation template for Lambda deployment
class CloudFormationGenerator:
"""
Generate CloudFormation templates for serverless deployment.
"""
@staticmethod
def lambda_stack(function_name: str, runtime: str = 'python3.10',
memory: int = 256, timeout: int = 60) -> dict:
"""Generate CloudFormation template for Lambda function."""
return {
'AWSTemplateFormatVersion': '2010-09-09',
'Description': f'Lambda function: {function_name}',
'Parameters': {
'Environment': {
'Type': 'String',
'Default': 'dev',
'AllowedValues': ['dev', 'staging', 'prod']
}
},
'Resources': {
'LambdaExecutionRole': {
'Type': 'AWS::IAM::Role',
'Properties': {
'AssumeRolePolicyDocument': {
'Version': '2012-10-17',
'Statement': [{
'Effect': 'Allow',
'Principal': {'Service': 'lambda.amazonaws.com'},
'Action': 'sts:AssumeRole'
}]
},
'ManagedPolicyArns': [
'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
]
}
},
'LambdaFunction': {
'Type': 'AWS::Lambda::Function',
'Properties': {
'FunctionName': {'Fn::Sub': f'{function_name}-${{Environment}}'},
'Runtime': runtime,
'Handler': 'handler.handler',
'MemorySize': memory,
'Timeout': timeout,
'Role': {'Fn::GetAtt': ['LambdaExecutionRole', 'Arn']},
'Environment': {
'Variables': {
'ENVIRONMENT': {'Ref': 'Environment'}
}
}
}
},
'ScheduleRule': {
'Type': 'AWS::Events::Rule',
'Properties': {
'ScheduleExpression': 'rate(5 minutes)',
'State': 'ENABLED',
'Targets': [{
'Id': 'LambdaTarget',
'Arn': {'Fn::GetAtt': ['LambdaFunction', 'Arn']}
}]
}
}
},
'Outputs': {
'FunctionArn': {
'Value': {'Fn::GetAtt': ['LambdaFunction', 'Arn']}
}
}
}
cf_template = CloudFormationGenerator.lambda_stack(
'market-data-fetcher',
memory=512,
timeout=120
)
print("CloudFormation Template:")
print("=" * 50)
print(yaml.dump(cf_template, default_flow_style=False))
Section 17.4: CI/CD Pipelines
CI/CD (Continuous Integration/Continuous Deployment) automates testing and deployment.
Pipeline Stages
- Build: Compile code, install dependencies
- Test: Run unit tests, integration tests
- Analyze: Code quality, security scanning
- Deploy: Push to staging/production
class CICDPipelineGenerator:
"""
Generate CI/CD pipeline configurations.
"""
@staticmethod
def github_actions_workflow() -> str:
"""Generate GitHub Actions workflow for trading application."""
workflow = {
'name': 'Trading System CI/CD',
'on': {
'push': {'branches': ['main', 'develop']},
'pull_request': {'branches': ['main']}
},
'env': {
'PYTHON_VERSION': '3.10',
'AWS_REGION': 'us-east-1'
},
'jobs': {
'test': {
'runs-on': 'ubuntu-latest',
'steps': [
{'uses': 'actions/checkout@v4'},
{
'name': 'Set up Python',
'uses': 'actions/setup-python@v4',
'with': {'python-version': '${{ env.PYTHON_VERSION }}'}
},
{
'name': 'Install dependencies',
'run': 'pip install -r requirements.txt -r requirements-dev.txt'
},
{
'name': 'Run linting',
'run': 'flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics'
},
{
'name': 'Run tests',
'run': 'pytest tests/ -v --cov=src --cov-report=xml'
},
{
'name': 'Upload coverage',
'uses': 'codecov/codecov-action@v3',
'with': {'files': 'coverage.xml'}
}
]
},
'build': {
'needs': 'test',
'runs-on': 'ubuntu-latest',
'if': "github.ref == 'refs/heads/main'",
'steps': [
{'uses': 'actions/checkout@v4'},
{
'name': 'Configure AWS credentials',
'uses': 'aws-actions/configure-aws-credentials@v4',
'with': {
'aws-access-key-id': '${{ secrets.AWS_ACCESS_KEY_ID }}',
'aws-secret-access-key': '${{ secrets.AWS_SECRET_ACCESS_KEY }}',
'aws-region': '${{ env.AWS_REGION }}'
}
},
{
'name': 'Login to ECR',
'uses': 'aws-actions/amazon-ecr-login@v2'
},
{
'name': 'Build and push Docker image',
'run': '''|
docker build -t trading-system .
docker tag trading-system:latest ${{ secrets.ECR_REGISTRY }}/trading-system:${{ github.sha }}
docker push ${{ secrets.ECR_REGISTRY }}/trading-system:${{ github.sha }}
'''
}
]
},
'deploy-staging': {
'needs': 'build',
'runs-on': 'ubuntu-latest',
'environment': 'staging',
'steps': [
{'uses': 'actions/checkout@v4'},
{
'name': 'Deploy to ECS Staging',
'run': 'aws ecs update-service --cluster staging --service trading-system --force-new-deployment'
}
]
},
'deploy-production': {
'needs': 'deploy-staging',
'runs-on': 'ubuntu-latest',
'environment': 'production',
'steps': [
{'uses': 'actions/checkout@v4'},
{
'name': 'Deploy to ECS Production',
'run': 'aws ecs update-service --cluster production --service trading-system --force-new-deployment'
}
]
}
}
}
return yaml.dump(workflow, default_flow_style=False, sort_keys=False)
@staticmethod
def pre_commit_config() -> str:
"""Generate pre-commit configuration."""
config = {
'repos': [
{
'repo': 'https://github.com/pre-commit/pre-commit-hooks',
'rev': 'v4.4.0',
'hooks': [
{'id': 'trailing-whitespace'},
{'id': 'end-of-file-fixer'},
{'id': 'check-yaml'},
{'id': 'check-json'},
{'id': 'check-merge-conflict'}
]
},
{
'repo': 'https://github.com/psf/black',
'rev': '23.9.1',
'hooks': [{'id': 'black', 'language_version': 'python3.10'}]
},
{
'repo': 'https://github.com/PyCQA/flake8',
'rev': '6.1.0',
'hooks': [{'id': 'flake8'}]
},
{
'repo': 'https://github.com/PyCQA/isort',
'rev': '5.12.0',
'hooks': [{'id': 'isort'}]
},
{
'repo': 'local',
'hooks': [
{
'id': 'pytest',
'name': 'pytest',
'entry': 'pytest tests/ -x',
'language': 'system',
'pass_filenames': False,
'always_run': True
}
]
}
]
}
return yaml.dump(config, default_flow_style=False)
print("GitHub Actions Workflow:")
print("=" * 50)
print(CICDPipelineGenerator.github_actions_workflow())
print("Pre-commit Configuration:")
print("=" * 50)
print(CICDPipelineGenerator.pre_commit_config())
Exercise 17.4: Complete Infrastructure Generator (Open-ended)
Build a class that generates complete infrastructure configurations for a trading platform, including cloud architecture, Docker setup, and CI/CD pipeline.
Click for solution
class InfrastructureGenerator:
"""
Generate complete infrastructure configurations for trading platforms.
"""
def __init__(self, project_name: str, provider: str = 'aws'):
self.project_name = project_name
self.provider = provider
self.services = []
self.environments = ['dev', 'staging', 'prod']
def add_service(self, name: str, service_type: str, tier: str = 'medium'):
"""Add a service to the infrastructure."""
self.services.append({
'name': name,
'type': service_type,
'tier': tier
})
def generate_architecture(self) -> Dict:
"""Generate cloud architecture configuration."""
costs = {
'compute': {'small': 50, 'medium': 150, 'large': 400},
'serverless': {'small': 10, 'medium': 50, 'large': 200},
'database': {'small': 30, 'medium': 100, 'large': 500},
'storage': {'small': 5, 'medium': 25, 'large': 100},
'queue': {'small': 1, 'medium': 10, 'large': 50},
}
architecture = {
'provider': self.provider,
'project': self.project_name,
'services': [],
'total_cost': 0
}
for svc in self.services:
cost = costs.get(svc['type'], {}).get(svc['tier'], 50)
architecture['services'].append({
**svc,
'estimated_cost': cost
})
architecture['total_cost'] += cost
return architecture
def generate_docker_compose(self) -> str:
"""Generate Docker Compose configuration."""
compose = DockerComposeGenerator(self.project_name)
for svc in self.services:
if svc['type'] == 'database':
compose.add_service(
name=svc['name'],
image='postgres:15',
ports=['5432:5432'],
environment={'POSTGRES_PASSWORD': '${DB_PASSWORD}'}
)
elif svc['type'] == 'storage':
compose.add_service(
name=svc['name'],
image='redis:7-alpine',
ports=['6379:6379']
)
else:
compose.add_service(
name=svc['name'],
build=f"./{svc['name']}"
)
return compose.generate()
def generate_cicd(self) -> str:
"""Generate CI/CD pipeline configuration."""
return CICDPipelineGenerator.github_actions_workflow()
def generate_env_template(self) -> str:
"""Generate environment variable template."""
lines = ['# Environment Configuration', '']
for svc in self.services:
lines.append(f'# {svc["name"].upper()}')
if svc['type'] == 'database':
lines.append('DATABASE_URL=postgresql://user:password@localhost:5432/db')
lines.append('DB_PASSWORD=changeme')
elif svc['type'] == 'storage':
lines.append('REDIS_URL=redis://localhost:6379')
lines.append('')
lines.extend([
'# API Keys',
'MARKET_DATA_API_KEY=your_key',
'BROKER_API_KEY=your_key',
'',
'# Application',
'ENVIRONMENT=development',
'SECRET_KEY=changeme'
])
return '\n'.join(lines)
def generate_all(self) -> Dict[str, str]:
"""Generate all infrastructure files."""
return {
'architecture.json': json.dumps(self.generate_architecture(), indent=2),
'docker-compose.yml': self.generate_docker_compose(),
'.github/workflows/ci-cd.yml': self.generate_cicd(),
'.env.example': self.generate_env_template()
}
# Test
infra = InfrastructureGenerator('QuantTradingPlatform', 'aws')
infra.add_service('api', 'compute', 'medium')
infra.add_service('strategy-engine', 'compute', 'large')
infra.add_service('postgres', 'database', 'medium')
infra.add_service('redis', 'storage', 'small')
infra.add_service('data-fetcher', 'serverless', 'medium')
files = infra.generate_all()
print(f"Generated {len(files)} files:")
for filename in files.keys():
print(f" - {filename}")
arch = infra.generate_architecture()
print(f"\nTotal Monthly Cost: ${arch['total_cost']}")
Exercise 17.5: Multi-Environment Deployer (Open-ended)
Create a deployment manager that handles multiple environments with proper configuration isolation.
Click for solution
class MultiEnvironmentDeployer:
"""
Manage deployments across multiple environments.
"""
def __init__(self, project_name: str):
self.project_name = project_name
self.environments = {}
self.services = {}
# Tier scaling by environment
self.tier_mapping = {
'dev': {'small': 'small', 'medium': 'small', 'large': 'medium'},
'staging': {'small': 'small', 'medium': 'medium', 'large': 'medium'},
'prod': {'small': 'medium', 'medium': 'medium', 'large': 'large'}
}
def add_environment(self, name: str, region: str, config: Dict = None):
"""Add an environment."""
self.environments[name] = {
'name': name,
'region': region,
'config': config or {},
'services': {}
}
def add_service(self, name: str, service_type: str, base_tier: str = 'medium'):
"""Add a service (applies to all environments with tier scaling)."""
self.services[name] = {
'type': service_type,
'base_tier': base_tier
}
def get_environment_config(self, env_name: str) -> Dict:
"""Get configuration for specific environment."""
if env_name not in self.environments:
raise ValueError(f"Unknown environment: {env_name}")
env = self.environments[env_name]
tier_map = self.tier_mapping.get(env_name, self.tier_mapping['dev'])
config = {
'environment': env_name,
'region': env['region'],
'services': {}
}
for svc_name, svc_config in self.services.items():
scaled_tier = tier_map[svc_config['base_tier']]
config['services'][svc_name] = {
'type': svc_config['type'],
'tier': scaled_tier
}
return config
def generate_terraform(self, env_name: str) -> str:
"""Generate Terraform for specific environment."""
config = self.get_environment_config(env_name)
tf_lines = [
f'# Terraform configuration for {self.project_name}',
f'# Environment: {env_name}',
'',
'provider "aws" {',
f' region = "{config["region"]}"',
'}',
'',
'locals {',
f' environment = "{env_name}"',
f' project = "{self.project_name}"',
'}',
''
]
for svc_name, svc_config in config['services'].items():
tf_lines.append(f'# {svc_name}')
tf_lines.append(f'# Type: {svc_config["type"]}, Tier: {svc_config["tier"]}')
tf_lines.append('')
return '\n'.join(tf_lines)
def validate_consistency(self) -> Dict:
"""Validate configuration consistency."""
issues = []
# Check all environments have required services
for env_name in self.environments:
config = self.get_environment_config(env_name)
if len(config['services']) != len(self.services):
issues.append(f"{env_name}: service count mismatch")
# Check tier scaling makes sense
for svc_name, svc_config in self.services.items():
dev_tier = self.tier_mapping['dev'][svc_config['base_tier']]
prod_tier = self.tier_mapping['prod'][svc_config['base_tier']]
tier_order = ['small', 'medium', 'large']
if tier_order.index(dev_tier) > tier_order.index(prod_tier):
issues.append(f"{svc_name}: dev tier > prod tier")
return {
'valid': len(issues) == 0,
'issues': issues,
'environments_checked': list(self.environments.keys()),
'services_checked': list(self.services.keys())
}
def estimate_costs(self) -> Dict:
"""Estimate costs per environment."""
costs = {
'compute': {'small': 50, 'medium': 150, 'large': 400},
'serverless': {'small': 10, 'medium': 50, 'large': 200},
'database': {'small': 30, 'medium': 100, 'large': 500},
'storage': {'small': 5, 'medium': 25, 'large': 100},
}
estimates = {}
for env_name in self.environments:
config = self.get_environment_config(env_name)
total = 0
for svc_config in config['services'].values():
svc_costs = costs.get(svc_config['type'], costs['compute'])
total += svc_costs.get(svc_config['tier'], 50)
estimates[env_name] = total
estimates['total'] = sum(estimates.values())
return estimates
# Test
deployer = MultiEnvironmentDeployer('TradingPlatform')
# Add environments
deployer.add_environment('dev', 'us-east-1')
deployer.add_environment('staging', 'us-east-1')
deployer.add_environment('prod', 'us-east-1')
# Add services
deployer.add_service('api', 'compute', 'medium')
deployer.add_service('strategy', 'compute', 'large')
deployer.add_service('database', 'database', 'medium')
# Validate
validation = deployer.validate_consistency()
print(f"Configuration Valid: {validation['valid']}")
# Estimate costs
costs = deployer.estimate_costs()
print(f"\nMonthly Cost Estimates:")
for env, cost in costs.items():
print(f" {env}: ${cost}")
Exercise 17.6: Deployment Health Checker (Open-ended)
Create a system that monitors deployment health and generates status reports.
Click for solution
import random
from datetime import datetime, timedelta
class DeploymentHealthChecker:
"""
Monitor deployment health and generate reports.
"""
def __init__(self, project_name: str):
self.project_name = project_name
self.services = {}
self.health_history = []
def add_service(self, name: str, endpoint: str, critical: bool = False):
"""Register a service for health monitoring."""
self.services[name] = {
'endpoint': endpoint,
'critical': critical,
'checks': []
}
def simulate_health_check(self, service_name: str) -> Dict:
"""Simulate a health check (normally would make HTTP request)."""
# Simulate realistic response times and occasional failures
is_healthy = random.random() > 0.05 # 95% success rate
response_time = random.uniform(10, 200) if is_healthy else None
result = {
'timestamp': datetime.now().isoformat(),
'service': service_name,
'healthy': is_healthy,
'response_time_ms': response_time,
'status_code': 200 if is_healthy else 500
}
if service_name in self.services:
self.services[service_name]['checks'].append(result)
return result
def run_all_checks(self) -> Dict:
"""Run health checks on all services."""
results = []
critical_failures = []
for name, config in self.services.items():
result = self.simulate_health_check(name)
results.append(result)
if not result['healthy'] and config['critical']:
critical_failures.append(name)
healthy_count = sum(1 for r in results if r['healthy'])
summary = {
'timestamp': datetime.now().isoformat(),
'total_services': len(results),
'healthy': healthy_count,
'unhealthy': len(results) - healthy_count,
'availability': healthy_count / len(results) * 100 if results else 0,
'critical_failures': critical_failures,
'overall_status': 'CRITICAL' if critical_failures else (
'HEALTHY' if healthy_count == len(results) else 'DEGRADED'
),
'details': results
}
self.health_history.append(summary)
return summary
def get_service_stats(self, service_name: str) -> Dict:
"""Get statistics for a specific service."""
if service_name not in self.services:
return {'error': 'Service not found'}
checks = self.services[service_name]['checks']
if not checks:
return {'error': 'No health checks recorded'}
healthy_checks = [c for c in checks if c['healthy']]
response_times = [c['response_time_ms'] for c in healthy_checks if c['response_time_ms']]
return {
'service': service_name,
'total_checks': len(checks),
'successful': len(healthy_checks),
'availability': len(healthy_checks) / len(checks) * 100,
'avg_response_time': sum(response_times) / len(response_times) if response_times else None,
'min_response_time': min(response_times) if response_times else None,
'max_response_time': max(response_times) if response_times else None
}
def generate_report(self) -> str:
"""Generate health status report."""
latest = self.run_all_checks()
report_lines = [
f"DEPLOYMENT HEALTH REPORT - {self.project_name}",
"=" * 50,
f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
"",
f"Overall Status: {latest['overall_status']}",
f"Availability: {latest['availability']:.1f}%",
f"Services: {latest['healthy']}/{latest['total_services']} healthy",
""
]
if latest['critical_failures']:
report_lines.append("CRITICAL FAILURES:")
for svc in latest['critical_failures']:
report_lines.append(f" - {svc} [IMMEDIATE ACTION REQUIRED]")
report_lines.append("")
report_lines.append("Service Details:")
for detail in latest['details']:
status = "OK" if detail['healthy'] else "FAIL"
rt = f"{detail['response_time_ms']:.0f}ms" if detail['response_time_ms'] else "N/A"
report_lines.append(f" [{status}] {detail['service']}: {rt}")
# Add recommendations
report_lines.extend(["", "Recommendations:"])
for detail in latest['details']:
if not detail['healthy']:
report_lines.append(f" - Investigate {detail['service']}: check logs, restart if needed")
elif detail['response_time_ms'] and detail['response_time_ms'] > 150:
report_lines.append(f" - {detail['service']}: high latency, consider scaling")
return "\n".join(report_lines)
# Test
checker = DeploymentHealthChecker('TradingPlatform')
checker.add_service('api', 'http://api:8000/health', critical=True)
checker.add_service('strategy-engine', 'http://strategy:8001/health', critical=True)
checker.add_service('database', 'http://postgres:5432', critical=True)
checker.add_service('redis', 'http://redis:6379/ping', critical=False)
checker.add_service('dashboard', 'http://dashboard:8050/health', critical=False)
# Run multiple checks
for _ in range(5):
checker.run_all_checks()
# Generate report
print(checker.generate_report())
Module Project: Cloud Deployment Template
Build a complete deployment template generator for a trading system.
def generate_trading_requirements():
"""
Generate requirements.txt for a trading application.
"""
packages = [
# Core data handling
'pandas>=2.0.0',
'numpy>=1.24.0',
# Market data
'yfinance>=0.2.0',
# Web framework
'fastapi>=0.100.0',
'uvicorn>=0.23.0',
'gunicorn>=21.0.0',
# Database
'sqlalchemy>=2.0.0',
'psycopg2-binary>=2.9.0',
'alembic>=1.12.0',
# Visualization
'plotly>=5.15.0',
'dash>=2.14.0',
# Utilities
'python-dotenv>=1.0.0',
'pydantic>=2.0.0',
'redis>=4.6.0',
# Scheduling
'schedule>=1.2.0',
'celery>=5.3.0',
# Scientific computing
'scipy>=1.11.0',
]
return '\n'.join(packages)
class CloudDeploymentTemplate:
"""
Complete cloud deployment template generator.
Features:
- Multi-environment support
- Container configuration
- Infrastructure as code
- CI/CD pipeline
"""
def __init__(self, project_name: str, provider: str = 'aws'):
self.project_name = project_name
self.provider = provider
self.architecture = CloudArchitecture(project_name, provider)
self.files = {}
def add_standard_services(self):
"""Add standard services for a trading system."""
# Core services
self.architecture.add_service("api-server", "compute", "medium")
self.architecture.add_service("strategy-engine", "compute", "medium")
self.architecture.add_service("data-processor", "serverless", "medium")
self.architecture.add_service("database", "database", "medium")
self.architecture.add_service("cache", "storage", "small")
self.architecture.add_service("message-queue", "queue", "medium")
# Connections
self.architecture.connect("data-processor", "message-queue")
self.architecture.connect("message-queue", "strategy-engine")
self.architecture.connect("strategy-engine", "database")
self.architecture.connect("api-server", "database")
self.architecture.connect("api-server", "cache")
def generate_all_files(self) -> Dict[str, str]:
"""Generate all deployment files."""
# Dockerfile
self.files['Dockerfile'] = DockerfileGenerator.generate(
'python-api', port=8000, entrypoint='main'
)
# Docker Compose
compose = DockerComposeGenerator(self.project_name)
compose.add_service('api', build='.', ports=['8000:8000'],
environment={'DATABASE_URL': '${DATABASE_URL}'})
compose.add_service('postgres', image='postgres:15',
environment={'POSTGRES_PASSWORD': '${DB_PASSWORD}'})
compose.add_service('redis', image='redis:7-alpine')
self.files['docker-compose.yml'] = compose.generate()
# Requirements
self.files['requirements.txt'] = generate_trading_requirements()
# GitHub Actions
self.files['.github/workflows/ci-cd.yml'] = CICDPipelineGenerator.github_actions_workflow()
# Pre-commit
self.files['.pre-commit-config.yaml'] = CICDPipelineGenerator.pre_commit_config()
# Terraform
self.files['terraform/main.tf'] = self.architecture.generate_terraform()
# Environment template
self.files['.env.example'] = self._generate_env_template()
# Makefile
self.files['Makefile'] = self._generate_makefile()
# README
self.files['README.md'] = self._generate_readme()
return self.files
def _generate_env_template(self) -> str:
"""Generate environment variable template."""
return '''# Database
DATABASE_URL=postgresql://user:password@localhost:5432/trading
DB_PASSWORD=your_secure_password
# Redis
REDIS_URL=redis://localhost:6379
# API Keys
MARKET_DATA_API_KEY=your_api_key
BROKER_API_KEY=your_broker_key
BROKER_SECRET=your_broker_secret
# AWS (for deployment)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
# Application
ENVIRONMENT=development
DEBUG=true
SECRET_KEY=your_secret_key_for_jwt
'''
def _generate_makefile(self) -> str:
"""Generate Makefile for common commands."""
return f'''# Makefile for {self.project_name}
.PHONY: install test lint run docker-build docker-up docker-down deploy
install:
\tpip install -r requirements.txt
\tpre-commit install
test:
\tpytest tests/ -v --cov=src
lint:
\tflake8 src/ tests/
\tblack --check src/ tests/
format:
\tblack src/ tests/
\tisort src/ tests/
run:
\tuvicorn main:app --reload --port 8000
docker-build:
\tdocker-compose build
docker-up:
\tdocker-compose up -d
docker-down:
\tdocker-compose down
docker-logs:
\tdocker-compose logs -f
deploy-staging:
\tcd terraform && terraform workspace select staging
\tcd terraform && terraform apply -auto-approve
deploy-prod:
\tcd terraform && terraform workspace select production
\tcd terraform && terraform apply
'''
def _generate_readme(self) -> str:
"""Generate README documentation."""
return f'''# {self.project_name}
Quantitative trading system deployed on {self.provider.upper()}.
## Quick Start
```bash
# Clone repository
git clone <repo-url>
cd {self.project_name.lower().replace(" ", "-")}
# Setup environment
cp .env.example .env
make install
# Run locally
make docker-up
```
## Architecture
- **API Server**: REST API for client access
- **Strategy Engine**: Executes trading strategies
- **Data Processor**: Collects and processes market data
- **Database**: PostgreSQL for persistent storage
- **Cache**: Redis for high-speed data access
## Deployment
```bash
# Deploy to staging
make deploy-staging
# Deploy to production
make deploy-prod
```
## Development
```bash
# Run tests
make test
# Lint code
make lint
# Format code
make format
```
## Estimated Monthly Cost
${self.architecture.total_estimated_cost()}/month (varies with usage)
'''
def display_summary(self):
"""Display deployment template summary."""
print(f"Cloud Deployment Template: {self.project_name}")
print("=" * 60)
print()
print("Generated Files:")
for filename in self.files.keys():
print(f" - {filename}")
print()
self.architecture.display_architecture()
# Generate complete deployment template
template = CloudDeploymentTemplate("Quant Trading Platform", provider='aws')
template.add_standard_services()
files = template.generate_all_files()
template.display_summary()
# Display generated Makefile
print("\nGenerated Makefile:")
print("=" * 50)
print(files['Makefile'])
Key Takeaways
Cloud Architecture
- Choose provider based on your needs (AWS for breadth, GCP for ML, Azure for enterprise)
- Design for scalability and fault tolerance
- Use Infrastructure as Code (Terraform, CloudFormation)
Containerization
- Docker ensures consistent environments
- Use multi-stage builds to reduce image size
- Docker Compose for local development
Serverless
- Ideal for event-driven workloads
- Pay only for execution time
- Cold starts can add latency
CI/CD
- Automate testing and deployment
- Use pre-commit hooks for code quality
- Deploy to staging before production
Best Practices
- Never commit secrets - use environment variables
- Tag Docker images with git commit SHA
- Run tests before every deployment
- Monitor costs and set billing alerts
- Use multiple environments (dev, staging, prod)
Next: Module 18 - Performance Monitoring
Module 18: 24/7 Operation
Part 5: Production & Infrastructure
| Duration | ~2.5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
Learning Objectives
By the end of this module, you will be able to: - Implement comprehensive system monitoring with metrics collection - Design and configure alerting systems for trading operations - Manage incidents with structured response processes - Build backup and recovery strategies for critical data
# Environment setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Callable
from enum import Enum
import random
import json
import warnings
warnings.filterwarnings('ignore')
print("Module 18: 24/7 Operation")
print("=" * 40)
Section 18.1: System Monitoring
You can't fix what you can't see. Monitoring is the foundation of reliable operations.
What to Monitor
| Category | Metrics | Why It Matters |
|---|---|---|
| Infrastructure | CPU, memory, disk, network | Resource exhaustion |
| Application | Latency, errors, throughput | User experience |
| Business | Orders executed, PnL, positions | Trading outcomes |
| Dependencies | Database, API, broker | External failures |
The Four Golden Signals (SRE)
- Latency: How long requests take
- Traffic: How many requests you're handling
- Errors: Rate of failed requests
- Saturation: How "full" your service is
class MetricType(Enum):
GAUGE = "gauge" # Current value (e.g., CPU usage)
COUNTER = "counter" # Cumulative count (e.g., total requests)
HISTOGRAM = "histogram" # Distribution (e.g., latency)
@dataclass
class Metric:
"""A single metric measurement."""
name: str
value: float
timestamp: datetime
labels: Dict[str, str] = field(default_factory=dict)
metric_type: MetricType = MetricType.GAUGE
class MetricsCollector:
"""
Collects and stores system metrics.
"""
def __init__(self, retention_hours: int = 24):
self.retention_hours = retention_hours
self.metrics: Dict[str, List[Metric]] = {}
self.counters: Dict[str, float] = {}
def record(self, name: str, value: float, labels: Dict = None,
metric_type: MetricType = MetricType.GAUGE):
"""Record a metric value."""
metric = Metric(
name=name,
value=value,
timestamp=datetime.now(),
labels=labels or {},
metric_type=metric_type
)
if name not in self.metrics:
self.metrics[name] = []
self.metrics[name].append(metric)
self._cleanup(name)
def increment(self, name: str, value: float = 1.0):
"""Increment a counter metric."""
if name not in self.counters:
self.counters[name] = 0
self.counters[name] += value
self.record(name, self.counters[name], metric_type=MetricType.COUNTER)
def _cleanup(self, name: str):
"""Remove old metrics beyond retention period."""
cutoff = datetime.now() - timedelta(hours=self.retention_hours)
self.metrics[name] = [
m for m in self.metrics[name] if m.timestamp > cutoff
]
def get_latest(self, name: str) -> Optional[float]:
"""Get the most recent value for a metric."""
if name not in self.metrics or not self.metrics[name]:
return None
return self.metrics[name][-1].value
def get_series(self, name: str, hours: float = 1.0) -> pd.DataFrame:
"""Get time series data for a metric."""
if name not in self.metrics:
return pd.DataFrame()
cutoff = datetime.now() - timedelta(hours=hours)
data = [
{'timestamp': m.timestamp, 'value': m.value}
for m in self.metrics[name]
if m.timestamp > cutoff
]
return pd.DataFrame(data)
def get_stats(self, name: str, hours: float = 1.0) -> Dict:
"""Get statistics for a metric over time period."""
series = self.get_series(name, hours)
if series.empty:
return {}
values = series['value'].values
return {
'min': float(np.min(values)),
'max': float(np.max(values)),
'mean': float(np.mean(values)),
'std': float(np.std(values)),
'p50': float(np.percentile(values, 50)),
'p95': float(np.percentile(values, 95)),
'p99': float(np.percentile(values, 99)),
'count': len(values)
}
class HealthChecker:
"""
Performs health checks on system components.
"""
def __init__(self):
self.checks: Dict[str, Callable] = {}
self.results: Dict[str, Dict] = {}
def register_check(self, name: str, check_func: Callable):
"""Register a health check function."""
self.checks[name] = check_func
def run_checks(self) -> Dict[str, Dict]:
"""Run all registered health checks."""
self.results = {}
for name, check_func in self.checks.items():
start_time = datetime.now()
try:
result = check_func()
healthy = result.get('healthy', True)
message = result.get('message', 'OK')
except Exception as e:
healthy = False
message = str(e)
duration = (datetime.now() - start_time).total_seconds() * 1000
self.results[name] = {
'healthy': healthy,
'message': message,
'duration_ms': duration,
'timestamp': datetime.now().isoformat()
}
return self.results
def is_healthy(self) -> bool:
"""Check if all components are healthy."""
if not self.results:
self.run_checks()
return all(r['healthy'] for r in self.results.values())
def get_status(self) -> Dict:
"""Get overall system status."""
if not self.results:
self.run_checks()
healthy_count = sum(1 for r in self.results.values() if r['healthy'])
total_count = len(self.results)
return {
'status': 'healthy' if self.is_healthy() else 'unhealthy',
'healthy_checks': healthy_count,
'total_checks': total_count,
'timestamp': datetime.now().isoformat(),
'checks': self.results
}
# Create metrics collector and simulate data
collector = MetricsCollector()
# Simulate collecting metrics over time
np.random.seed(42)
base_time = datetime.now() - timedelta(hours=1)
for i in range(120): # 2 hours of data at 1-minute intervals
# Simulated metrics
cpu = 30 + np.random.normal(0, 10) + (i % 20) * 0.5 # Periodic load
memory = 60 + np.random.normal(0, 5)
latency = 50 + np.random.exponential(20) # Long tail
collector.record('cpu_percent', max(0, min(100, cpu)))
collector.record('memory_percent', max(0, min(100, memory)))
collector.record('api_latency_ms', latency)
collector.increment('requests_total', np.random.randint(10, 100))
collector.increment('errors_total', np.random.choice([0, 0, 0, 1]))
# Display statistics
print("System Metrics Summary (Last Hour)")
print("=" * 50)
for metric in ['cpu_percent', 'memory_percent', 'api_latency_ms']:
stats = collector.get_stats(metric)
print(f"\n{metric}:")
print(f" Mean: {stats['mean']:.1f}")
print(f" P50: {stats['p50']:.1f}")
print(f" P95: {stats['p95']:.1f}")
print(f" P99: {stats['p99']:.1f}")
# Set up health checks
health = HealthChecker()
# Simulated health check functions
def check_database():
# Simulate database connectivity check
return {'healthy': random.random() > 0.05, 'message': 'Database connected'}
def check_broker_api():
# Simulate broker API check
return {'healthy': random.random() > 0.1, 'message': 'Broker API responding'}
def check_market_data():
# Simulate market data feed check
return {'healthy': random.random() > 0.02, 'message': 'Market data streaming'}
def check_disk_space():
# Simulate disk space check
usage = random.uniform(40, 80)
return {
'healthy': usage < 90,
'message': f'Disk usage: {usage:.1f}%'
}
health.register_check('database', check_database)
health.register_check('broker_api', check_broker_api)
health.register_check('market_data', check_market_data)
health.register_check('disk_space', check_disk_space)
# Run health checks
status = health.get_status()
print("\nHealth Check Results")
print("=" * 50)
print(f"Overall Status: {status['status'].upper()}")
print(f"Healthy: {status['healthy_checks']}/{status['total_checks']}")
print()
for name, result in status['checks'].items():
status_icon = "✓" if result['healthy'] else "✗"
print(f" {status_icon} {name}: {result['message']} ({result['duration_ms']:.1f}ms)")
Exercise 18.1: Position Risk Health Check (Guided)
Create a health check that monitors position risk and returns unhealthy if any position exceeds a maximum size.
Click for solution
def check_position_risk(positions: Dict[str, float], max_position_pct: float = 0.20) -> Dict:
"""
Check if any position exceeds maximum allowed size.
Args:
positions: Dict of {symbol: position_value}
max_position_pct: Maximum allowed position as % of total
Returns:
Health check result dict
"""
if not positions:
return {'healthy': True, 'message': 'No positions'}
total_value = sum(abs(v) for v in positions.values())
if total_value == 0:
return {'healthy': True, 'message': 'No positions'}
violations = []
for symbol, value in positions.items():
position_pct = abs(value) / total_value
if position_pct > max_position_pct:
violations.append(f"{symbol}: {position_pct:.1%}")
if violations:
return {
'healthy': False,
'message': f"Position limits exceeded: {', '.join(violations)}"
}
return {
'healthy': True,
'message': f'All positions within {max_position_pct:.0%} limit'
}
Section 18.2: Alerting Systems
Monitoring is useless if no one sees the problems. Alerting bridges that gap.
Alert Design Principles
- Actionable: Every alert should require action
- Urgent: Reserve alerts for time-sensitive issues
- Meaningful: Avoid alert fatigue from false positives
- Informative: Include enough context to diagnose
class AlertSeverity(Enum):
INFO = "info"
WARNING = "warning"
CRITICAL = "critical"
@dataclass
class Alert:
"""Represents a system alert."""
name: str
severity: AlertSeverity
message: str
timestamp: datetime = field(default_factory=datetime.now)
labels: Dict = field(default_factory=dict)
resolved: bool = False
resolved_at: Optional[datetime] = None
@dataclass
class AlertRule:
"""Defines when an alert should fire."""
name: str
condition: Callable # Function that returns True if alert should fire
severity: AlertSeverity
message_template: str
cooldown_minutes: int = 5 # Minimum time between alerts
last_fired: Optional[datetime] = None
class AlertManager:
"""
Manages alert rules, firing, and notification.
"""
def __init__(self):
self.rules: List[AlertRule] = []
self.active_alerts: List[Alert] = []
self.alert_history: List[Alert] = []
self.notification_channels: List[Callable] = []
def add_rule(self, name: str, condition: Callable, severity: AlertSeverity,
message_template: str, cooldown_minutes: int = 5):
"""Add an alert rule."""
rule = AlertRule(
name=name,
condition=condition,
severity=severity,
message_template=message_template,
cooldown_minutes=cooldown_minutes
)
self.rules.append(rule)
def add_notification_channel(self, channel: Callable):
"""Add a notification channel (function that receives alerts)."""
self.notification_channels.append(channel)
def check_rules(self, context: Dict) -> List[Alert]:
"""
Check all rules and fire alerts as needed.
Parameters:
-----------
context : dict
Current system state for rule evaluation
Returns:
--------
List[Alert] : New alerts that were fired
"""
new_alerts = []
now = datetime.now()
for rule in self.rules:
# Check cooldown
if rule.last_fired:
cooldown_end = rule.last_fired + timedelta(minutes=rule.cooldown_minutes)
if now < cooldown_end:
continue
# Evaluate condition
try:
should_fire = rule.condition(context)
except Exception as e:
should_fire = False
if should_fire:
# Create alert
message = rule.message_template.format(**context)
alert = Alert(
name=rule.name,
severity=rule.severity,
message=message
)
# Record and notify
self.active_alerts.append(alert)
self.alert_history.append(alert)
new_alerts.append(alert)
rule.last_fired = now
# Send to notification channels
for channel in self.notification_channels:
try:
channel(alert)
except Exception as e:
print(f"Notification failed: {e}")
return new_alerts
def resolve_alert(self, alert_name: str):
"""Resolve active alerts by name."""
for alert in self.active_alerts:
if alert.name == alert_name and not alert.resolved:
alert.resolved = True
alert.resolved_at = datetime.now()
# Remove resolved alerts from active list
self.active_alerts = [a for a in self.active_alerts if not a.resolved]
def get_active_alerts(self) -> List[Alert]:
"""Get all active (unresolved) alerts."""
return self.active_alerts
def get_alert_summary(self) -> Dict:
"""Get summary of alert activity."""
return {
'active_count': len(self.active_alerts),
'critical_count': sum(1 for a in self.active_alerts if a.severity == AlertSeverity.CRITICAL),
'warning_count': sum(1 for a in self.active_alerts if a.severity == AlertSeverity.WARNING),
'total_fired_24h': sum(
1 for a in self.alert_history
if a.timestamp > datetime.now() - timedelta(hours=24)
),
'active_alerts': [
{'name': a.name, 'severity': a.severity.value, 'message': a.message}
for a in self.active_alerts
]
}
# Create alert manager
alerts = AlertManager()
# Add notification channel (just print for demo)
def console_notification(alert: Alert):
severity_emoji = {'info': 'ℹ️', 'warning': '⚠️', 'critical': '🚨'}
emoji = severity_emoji.get(alert.severity.value, '📢')
print(f"{emoji} [{alert.severity.value.upper()}] {alert.name}: {alert.message}")
alerts.add_notification_channel(console_notification)
# Add alert rules
alerts.add_rule(
name='high_cpu',
condition=lambda ctx: ctx.get('cpu_percent', 0) > 80,
severity=AlertSeverity.WARNING,
message_template='CPU usage at {cpu_percent:.1f}%',
cooldown_minutes=5
)
alerts.add_rule(
name='critical_cpu',
condition=lambda ctx: ctx.get('cpu_percent', 0) > 95,
severity=AlertSeverity.CRITICAL,
message_template='CRITICAL: CPU at {cpu_percent:.1f}%!',
cooldown_minutes=1
)
alerts.add_rule(
name='high_error_rate',
condition=lambda ctx: ctx.get('error_rate', 0) > 0.05,
severity=AlertSeverity.WARNING,
message_template='Error rate elevated: {error_rate:.1%}',
cooldown_minutes=5
)
alerts.add_rule(
name='large_drawdown',
condition=lambda ctx: ctx.get('drawdown_pct', 0) > 0.10,
severity=AlertSeverity.CRITICAL,
message_template='Large drawdown: {drawdown_pct:.1%}',
cooldown_minutes=15
)
print("Alert Rules Configured")
print("=" * 50)
for rule in alerts.rules:
print(f" {rule.severity.value.upper():8} {rule.name}")
# Simulate system states and check alerts
print("\nSimulating Alert Scenarios")
print("=" * 50)
# Normal state
print("\n1. Normal state:")
context_normal = {'cpu_percent': 45, 'error_rate': 0.01, 'drawdown_pct': 0.02}
new_alerts = alerts.check_rules(context_normal)
if not new_alerts:
print(" No alerts fired")
# High CPU
print("\n2. High CPU:")
context_high_cpu = {'cpu_percent': 85, 'error_rate': 0.01, 'drawdown_pct': 0.02}
alerts.check_rules(context_high_cpu)
# Critical situation
print("\n3. Critical situation:")
context_critical = {'cpu_percent': 97, 'error_rate': 0.08, 'drawdown_pct': 0.12}
alerts.check_rules(context_critical)
# Summary
print("\nAlert Summary:")
summary = alerts.get_alert_summary()
print(f" Active alerts: {summary['active_count']}")
print(f" Critical: {summary['critical_count']}")
print(f" Warning: {summary['warning_count']}")
Exercise 18.2: Alert Rule Builder (Guided)
Build a function that creates alert rules with proper threshold configuration.
Click for solution
def create_threshold_alert(metric_name: str, warning_threshold: float,
critical_threshold: float, comparison: str = 'above') -> List[Dict]:
"""
Create warning and critical alert rules for a metric.
Args:
metric_name: Name of the metric to monitor
warning_threshold: Threshold for warning alerts
critical_threshold: Threshold for critical alerts
comparison: 'above' or 'below'
Returns:
List of alert rule configurations
"""
rules = []
if comparison == 'above':
warning_condition = lambda ctx: ctx.get(metric_name, 0) > warning_threshold
critical_condition = lambda ctx: ctx.get(metric_name, 0) > critical_threshold
else:
warning_condition = lambda ctx: ctx.get(metric_name, float('inf')) < warning_threshold
critical_condition = lambda ctx: ctx.get(metric_name, float('inf')) < critical_threshold
warning_rule = {
'name': f'{metric_name}_warning',
'condition': warning_condition,
'severity': AlertSeverity.WARNING,
'message_template': f'{metric_name} {{' + metric_name + f'}} {comparison} warning threshold ({warning_threshold})'
}
critical_rule = {
'name': f'{metric_name}_critical',
'condition': critical_condition,
'severity': AlertSeverity.CRITICAL,
'message_template': f'{metric_name} {{' + metric_name + f'}} {comparison} critical threshold ({critical_threshold})'
}
rules.append(warning_rule)
rules.append(critical_rule)
return rules
Section 18.3: Incident Response
When things go wrong (and they will), having a systematic response process is crucial.
Incident Lifecycle
- Detection: Alert fires or user reports issue
- Triage: Assess severity and impact
- Response: Follow runbook, mitigate impact
- Resolution: Fix the root cause
- Post-mortem: Learn and prevent recurrence
class IncidentSeverity(Enum):
SEV1 = "sev1" # Critical: Trading halted, major financial impact
SEV2 = "sev2" # High: Degraded performance, significant impact
SEV3 = "sev3" # Medium: Minor issues, limited impact
SEV4 = "sev4" # Low: Cosmetic issues, no financial impact
@dataclass
class Incident:
"""Represents a system incident."""
id: str
title: str
severity: IncidentSeverity
description: str
created_at: datetime = field(default_factory=datetime.now)
resolved_at: Optional[datetime] = None
status: str = "open" # open, investigating, mitigating, resolved
timeline: List[Dict] = field(default_factory=list)
affected_systems: List[str] = field(default_factory=list)
impact: str = ""
root_cause: str = ""
resolution: str = ""
@dataclass
class Runbook:
"""A runbook for incident response."""
name: str
description: str
symptoms: List[str]
steps: List[Dict] # {step: str, expected_outcome: str}
escalation: str
estimated_time_minutes: int
class IncidentManager:
"""
Manages incident lifecycle and documentation.
"""
def __init__(self):
self.incidents: Dict[str, Incident] = {}
self.runbooks: Dict[str, Runbook] = {}
self._incident_counter = 0
def create_incident(self, title: str, severity: IncidentSeverity,
description: str, affected_systems: List[str] = None) -> Incident:
"""Create a new incident."""
self._incident_counter += 1
incident_id = f"INC-{self._incident_counter:04d}"
incident = Incident(
id=incident_id,
title=title,
severity=severity,
description=description,
affected_systems=affected_systems or []
)
# Add creation to timeline
incident.timeline.append({
'time': datetime.now().isoformat(),
'event': 'Incident created',
'details': description
})
self.incidents[incident_id] = incident
return incident
def update_status(self, incident_id: str, new_status: str, notes: str = ""):
"""Update incident status."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
incident = self.incidents[incident_id]
old_status = incident.status
incident.status = new_status
incident.timeline.append({
'time': datetime.now().isoformat(),
'event': f'Status changed: {old_status} -> {new_status}',
'details': notes
})
if new_status == 'resolved':
incident.resolved_at = datetime.now()
def add_timeline_entry(self, incident_id: str, event: str, details: str = ""):
"""Add an entry to the incident timeline."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
self.incidents[incident_id].timeline.append({
'time': datetime.now().isoformat(),
'event': event,
'details': details
})
def add_runbook(self, runbook: Runbook):
"""Add a runbook to the library."""
self.runbooks[runbook.name] = runbook
def find_runbook(self, symptoms: List[str]) -> Optional[Runbook]:
"""Find a runbook matching given symptoms."""
for runbook in self.runbooks.values():
matching = sum(1 for s in symptoms if any(s.lower() in rs.lower() for rs in runbook.symptoms))
if matching > 0:
return runbook
return None
def generate_postmortem(self, incident_id: str) -> str:
"""Generate a post-mortem report for an incident."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
incident = self.incidents[incident_id]
duration = "Ongoing"
if incident.resolved_at:
duration_mins = (incident.resolved_at - incident.created_at).total_seconds() / 60
duration = f"{duration_mins:.0f} minutes"
report = f"""
# Post-Mortem Report: {incident.id}
## Summary
- **Title**: {incident.title}
- **Severity**: {incident.severity.value.upper()}
- **Duration**: {duration}
- **Status**: {incident.status}
## Impact
{incident.impact or 'Not documented'}
## Affected Systems
{', '.join(incident.affected_systems) or 'Not specified'}
## Timeline
"""
for entry in incident.timeline:
report += f"- **{entry['time']}**: {entry['event']}\n"
if entry.get('details'):
report += f" - {entry['details']}\n"
report += f"""
## Root Cause
{incident.root_cause or 'Under investigation'}
## Resolution
{incident.resolution or 'Not yet resolved'}
## Action Items
- [ ] Document lessons learned
- [ ] Update monitoring/alerting
- [ ] Review and update runbooks
- [ ] Schedule follow-up review
"""
return report
# Create incident manager and add runbooks
incidents = IncidentManager()
# Add some runbooks
runbook_db = Runbook(
name='database_connection_failure',
description='Steps to diagnose and recover from database connection issues',
symptoms=['database connection', 'connection refused', 'timeout', 'postgres'],
steps=[
{'step': 'Check database server status', 'expected_outcome': 'Server should be running'},
{'step': 'Verify network connectivity', 'expected_outcome': 'Ping should succeed'},
{'step': 'Check connection pool exhaustion', 'expected_outcome': 'Connections < max_pool'},
{'step': 'Review database logs', 'expected_outcome': 'Identify error messages'},
{'step': 'Restart connection pool if needed', 'expected_outcome': 'Connections restored'},
],
escalation='If not resolved in 15 minutes, page on-call DBA',
estimated_time_minutes=30
)
runbook_api = Runbook(
name='broker_api_failure',
description='Steps to handle broker API failures',
symptoms=['broker', 'api error', '401', '403', '500', 'order rejected'],
steps=[
{'step': 'Check broker status page', 'expected_outcome': 'Identify if broker-wide issue'},
{'step': 'Verify API credentials', 'expected_outcome': 'Credentials should be valid'},
{'step': 'Check rate limits', 'expected_outcome': 'Should be within limits'},
{'step': 'Enable backup broker if available', 'expected_outcome': 'Orders route to backup'},
{'step': 'Pause new order submission', 'expected_outcome': 'Prevent further failures'},
],
escalation='Contact broker support and notify risk team',
estimated_time_minutes=15
)
incidents.add_runbook(runbook_db)
incidents.add_runbook(runbook_api)
print("Runbooks Loaded:")
for name, rb in incidents.runbooks.items():
print(f" - {name}: {rb.description}")
Exercise 18.3: Backup Job Configuration (Guided)
Create a function that validates backup job configurations and calculates retention requirements.
Click for solution
def validate_backup_config(jobs: List[Dict]) -> Dict:
"""
Validate backup job configurations and calculate storage needs.
Args:
jobs: List of backup job configurations
Returns:
Validation results and storage estimates
"""
results = {
'valid': True,
'errors': [],
'warnings': [],
'jobs': [],
'total_daily_storage_gb': 0,
'total_retention_storage_gb': 0
}
required_fields = ['name', 'frequency_hours', 'retention_days', 'estimated_size_gb']
for job in jobs:
job_result = {'name': job.get('name', 'unknown'), 'valid': True}
for field in required_fields:
if field not in job:
job_result['valid'] = False
results['errors'].append(f"{job_result['name']}: missing {field}")
if not job_result['valid']:
results['jobs'].append(job_result)
continue
backups_per_day = 24 / job['frequency_hours']
daily_storage = backups_per_day * job['estimated_size_gb']
job_result['daily_storage_gb'] = daily_storage
retention_storage = daily_storage * job['retention_days']
job_result['retention_storage_gb'] = retention_storage
results['total_daily_storage_gb'] += daily_storage
results['total_retention_storage_gb'] += retention_storage
if job['retention_days'] < 7:
results['warnings'].append(f"{job['name']}: retention less than 7 days")
if backups_per_day < 1:
results['warnings'].append(f"{job['name']}: backup frequency > 24 hours")
results['jobs'].append(job_result)
results['valid'] = len(results['errors']) == 0
return results
Section 18.4: Backup & Recovery
Data is your most valuable asset. Losing trade history, positions, or configuration can be catastrophic.
Backup Strategy
| Data Type | Backup Frequency | Retention | Recovery Time Objective |
|---|---|---|---|
| Trade database | Continuous (replication) | 90 days | < 1 hour |
| Configuration | On change | Forever | < 15 minutes |
| Market data | Daily | 30 days | < 4 hours |
| Logs | Daily | 7 days | < 1 hour |
@dataclass
class BackupJob:
"""Represents a backup job."""
name: str
source: str
destination: str
schedule: str # cron expression or description
retention_days: int
last_run: Optional[datetime] = None
last_status: str = "never_run"
last_size_mb: float = 0
@dataclass
class Backup:
"""Represents a completed backup."""
job_name: str
timestamp: datetime
path: str
size_mb: float
checksum: str
metadata: Dict = field(default_factory=dict)
class BackupManager:
"""
Manages backup jobs and recovery.
"""
def __init__(self):
self.jobs: Dict[str, BackupJob] = {}
self.backups: List[Backup] = []
def add_job(self, job: BackupJob):
"""Add a backup job."""
self.jobs[job.name] = job
def simulate_backup(self, job_name: str) -> Backup:
"""
Simulate running a backup job.
"""
if job_name not in self.jobs:
raise ValueError(f"Job {job_name} not found")
job = self.jobs[job_name]
# Simulate backup creation
timestamp = datetime.now()
size_mb = random.uniform(10, 500) # Simulated size
checksum = f"sha256:{random.getrandbits(256):064x}"[:72]
backup = Backup(
job_name=job_name,
timestamp=timestamp,
path=f"{job.destination}/{job_name}_{timestamp.strftime('%Y%m%d_%H%M%S')}.backup",
size_mb=size_mb,
checksum=checksum,
metadata={
'source': job.source,
'compression': 'gzip',
'encrypted': True
}
)
# Update job status
job.last_run = timestamp
job.last_status = 'success'
job.last_size_mb = size_mb
self.backups.append(backup)
return backup
def list_backups(self, job_name: str = None, days: int = 7) -> List[Backup]:
"""List recent backups."""
cutoff = datetime.now() - timedelta(days=days)
backups = [
b for b in self.backups
if b.timestamp > cutoff and (job_name is None or b.job_name == job_name)
]
return sorted(backups, key=lambda b: b.timestamp, reverse=True)
def get_backup_report(self) -> Dict:
"""Generate backup status report."""
report = {
'timestamp': datetime.now().isoformat(),
'jobs': [],
'total_backups': len(self.backups),
'total_size_gb': sum(b.size_mb for b in self.backups) / 1024
}
for name, job in self.jobs.items():
recent_backups = self.list_backups(name, days=7)
job_report = {
'name': name,
'schedule': job.schedule,
'last_run': job.last_run.isoformat() if job.last_run else 'Never',
'last_status': job.last_status,
'backups_last_7d': len(recent_backups),
'retention_days': job.retention_days
}
report['jobs'].append(job_report)
return report
# Create backup manager and jobs
backups = BackupManager()
# Add backup jobs
backups.add_job(BackupJob(
name='trade_database',
source='postgresql://localhost:5432/trading',
destination='s3://backups/database',
schedule='0 */4 * * *', # Every 4 hours
retention_days=90
))
backups.add_job(BackupJob(
name='configuration',
source='/etc/trading/',
destination='s3://backups/config',
schedule='On change',
retention_days=365
))
# Simulate some backups
for job_name in backups.jobs.keys():
for _ in range(3):
backups.simulate_backup(job_name)
# Display report
report = backups.get_backup_report()
print("Backup Status Report")
print("=" * 60)
print(f"Total Backups: {report['total_backups']}")
print(f"Total Size: {report['total_size_gb']:.2f} GB")
print()
for job in report['jobs']:
status_icon = "✓" if job['last_status'] == 'success' else "✗"
print(f"{status_icon} {job['name']}")
print(f" Schedule: {job['schedule']}")
print(f" Last Run: {job['last_run']}")
print(f" Retention: {job['retention_days']} days")
Exercise 18.4: Complete Monitoring System (Open-ended)
Build a comprehensive monitoring system that tracks system health, collects metrics, and generates status reports.
Click for solution
class MonitoringSystem:
"""
Comprehensive system monitoring with metrics, health checks, and reporting.
"""
def __init__(self, name: str, retention_hours: int = 24):
self.name = name
self.retention_hours = retention_hours
self.metrics: Dict[str, List[Dict]] = {}
self.health_checks: Dict[str, Callable] = {}
self.health_results: Dict[str, Dict] = {}
def record_metric(self, name: str, value: float, metric_type: str = 'gauge'):
"""Record a metric value."""
if name not in self.metrics:
self.metrics[name] = []
self.metrics[name].append({
'value': value,
'timestamp': datetime.now(),
'type': metric_type
})
# Cleanup old entries
cutoff = datetime.now() - timedelta(hours=self.retention_hours)
self.metrics[name] = [m for m in self.metrics[name] if m['timestamp'] > cutoff]
def get_metric_stats(self, name: str, hours: float = 1.0) -> Dict:
"""Get statistics for a metric."""
if name not in self.metrics:
return {}
cutoff = datetime.now() - timedelta(hours=hours)
values = [m['value'] for m in self.metrics[name] if m['timestamp'] > cutoff]
if not values:
return {}
return {
'current': values[-1],
'min': min(values),
'max': max(values),
'mean': sum(values) / len(values),
'count': len(values)
}
def register_health_check(self, name: str, check_func: Callable):
"""Register a health check function."""
self.health_checks[name] = check_func
def run_health_checks(self) -> Dict:
"""Run all health checks."""
self.health_results = {}
for name, check_func in self.health_checks.items():
try:
result = check_func()
self.health_results[name] = {
'healthy': result.get('healthy', True),
'message': result.get('message', 'OK'),
'timestamp': datetime.now().isoformat()
}
except Exception as e:
self.health_results[name] = {
'healthy': False,
'message': str(e),
'timestamp': datetime.now().isoformat()
}
return self.health_results
def is_healthy(self) -> bool:
"""Check if system is healthy."""
if not self.health_results:
self.run_health_checks()
return all(r['healthy'] for r in self.health_results.values())
def generate_report(self) -> str:
"""Generate status report."""
self.run_health_checks()
lines = [
f"MONITORING REPORT: {self.name}",
"=" * 50,
f"Time: {datetime.now().isoformat()}",
f"Overall Health: {'HEALTHY' if self.is_healthy() else 'UNHEALTHY'}",
"",
"HEALTH CHECKS:"
]
for name, result in self.health_results.items():
icon = "✓" if result['healthy'] else "✗"
lines.append(f" {icon} {name}: {result['message']}")
lines.append("\nMETRICS:")
for name in self.metrics.keys():
stats = self.get_metric_stats(name)
if stats:
lines.append(f" {name}: current={stats['current']:.1f}, mean={stats['mean']:.1f}")
return "\n".join(lines)
# Test
monitor = MonitoringSystem("Trading System")
# Add health checks
monitor.register_health_check('database', lambda: {'healthy': True, 'message': 'Connected'})
monitor.register_health_check('api', lambda: {'healthy': True, 'message': 'Responding'})
# Record metrics
for _ in range(10):
monitor.record_metric('cpu', random.uniform(30, 70))
monitor.record_metric('memory', random.uniform(50, 80))
# Generate report
print(monitor.generate_report())
Exercise 18.5: Incident Management System (Open-ended)
Create a comprehensive incident management system with runbooks and post-mortem generation.
Click for solution
class IncidentManagementSystem:
"""
Comprehensive incident management with runbooks and reporting.
"""
def __init__(self):
self.incidents = {}
self.runbooks = {}
self._counter = 0
def add_runbook(self, name: str, symptoms: List[str], steps: List[str],
escalation: str):
"""Add a runbook to the library."""
self.runbooks[name] = {
'symptoms': symptoms,
'steps': steps,
'escalation': escalation
}
def find_runbook(self, symptoms: List[str]) -> Optional[str]:
"""Find matching runbook for symptoms."""
for name, runbook in self.runbooks.items():
for symptom in symptoms:
if any(symptom.lower() in rb_symptom.lower()
for rb_symptom in runbook['symptoms']):
return name
return None
def create_incident(self, title: str, severity: str, description: str,
affected_systems: List[str] = None) -> str:
"""Create a new incident."""
self._counter += 1
incident_id = f"INC-{self._counter:04d}"
self.incidents[incident_id] = {
'title': title,
'severity': severity,
'description': description,
'status': 'open',
'affected_systems': affected_systems or [],
'created_at': datetime.now(),
'resolved_at': None,
'timeline': [{
'time': datetime.now().isoformat(),
'event': 'Incident created',
'details': description
}],
'root_cause': '',
'resolution': ''
}
return incident_id
def update_status(self, incident_id: str, status: str, notes: str = ""):
"""Update incident status."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
incident = self.incidents[incident_id]
old_status = incident['status']
incident['status'] = status
incident['timeline'].append({
'time': datetime.now().isoformat(),
'event': f'Status: {old_status} -> {status}',
'details': notes
})
if status == 'resolved':
incident['resolved_at'] = datetime.now()
def add_note(self, incident_id: str, event: str, details: str = ""):
"""Add timeline note to incident."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
self.incidents[incident_id]['timeline'].append({
'time': datetime.now().isoformat(),
'event': event,
'details': details
})
def generate_postmortem(self, incident_id: str) -> str:
"""Generate post-mortem report."""
if incident_id not in self.incidents:
raise ValueError(f"Incident {incident_id} not found")
inc = self.incidents[incident_id]
duration = "Ongoing"
if inc['resolved_at']:
mins = (inc['resolved_at'] - inc['created_at']).total_seconds() / 60
duration = f"{mins:.0f} minutes"
lines = [
f"POST-MORTEM: {incident_id}",
"=" * 50,
f"Title: {inc['title']}",
f"Severity: {inc['severity']}",
f"Duration: {duration}",
f"Status: {inc['status']}",
"",
"TIMELINE:"
]
for entry in inc['timeline']:
lines.append(f" {entry['time']}: {entry['event']}")
if entry.get('details'):
lines.append(f" -> {entry['details']}")
lines.extend([
"",
f"ROOT CAUSE: {inc['root_cause'] or 'TBD'}",
f"RESOLUTION: {inc['resolution'] or 'TBD'}"
])
return "\n".join(lines)
# Test
ims = IncidentManagementSystem()
# Add runbook
ims.add_runbook(
'database_issues',
symptoms=['database', 'connection', 'timeout'],
steps=['Check server status', 'Verify connectivity', 'Restart if needed'],
escalation='Page DBA'
)
# Create and manage incident
inc_id = ims.create_incident(
title='Database Connection Issues',
severity='SEV2',
description='Multiple connection timeouts',
affected_systems=['api', 'orders']
)
ims.update_status(inc_id, 'investigating', 'Engineer assigned')
ims.add_note(inc_id, 'Root cause identified', 'Connection pool exhausted')
ims.incidents[inc_id]['root_cause'] = 'Connection pool exhaustion'
ims.incidents[inc_id]['resolution'] = 'Restarted services'
ims.update_status(inc_id, 'resolved', 'Services restored')
print(ims.generate_postmortem(inc_id))
Exercise 18.6: Operations Dashboard (Open-ended)
Build a unified operations dashboard that combines monitoring, alerts, incidents, and backups into a single view.
Click for solution
class OperationsDashboard:
"""
Unified operations dashboard combining all monitoring components.
"""
def __init__(self, name: str):
self.name = name
self.metrics = MetricsCollector()
self.health = HealthChecker()
self.alerts = AlertManager()
self.backups = BackupManager()
self.active_incidents = []
def collect_metrics(self, metrics_data: Dict):
"""Collect system metrics."""
for name, value in metrics_data.items():
self.metrics.record(name, value)
self.alerts.check_rules(metrics_data)
def add_health_check(self, name: str, check_func: Callable):
"""Register a health check."""
self.health.register_check(name, check_func)
def add_backup_job(self, job: BackupJob):
"""Add a backup job."""
self.backups.add_job(job)
def get_overall_status(self) -> str:
"""Determine overall system status."""
# Check for critical alerts
alert_summary = self.alerts.get_alert_summary()
if alert_summary['critical_count'] > 0:
return 'critical'
# Check health
if not self.health.is_healthy():
return 'warning'
# Check for warnings
if alert_summary['warning_count'] > 0:
return 'warning'
return 'healthy'
def generate_dashboard(self) -> str:
"""Generate comprehensive dashboard display."""
status = self.get_overall_status()
status_emoji = {'healthy': '🟢', 'warning': '🟡', 'critical': '🔴'}
lines = [
"=" * 60,
f"OPERATIONS DASHBOARD: {self.name}",
f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
"=" * 60,
"",
f"OVERALL STATUS: {status_emoji.get(status, '⚪')} {status.upper()}",
""
]
# Health checks
lines.append("HEALTH CHECKS")
lines.append("-" * 40)
health_status = self.health.get_status()
lines.append(f"Status: {health_status['healthy_checks']}/{health_status['total_checks']} healthy")
for name, result in health_status.get('checks', {}).items():
icon = "✓" if result['healthy'] else "✗"
lines.append(f" {icon} {name}: {result['message']}")
lines.append("")
# Alerts
lines.append("ALERTS")
lines.append("-" * 40)
alert_summary = self.alerts.get_alert_summary()
lines.append(f"Active: {alert_summary['active_count']} (Critical: {alert_summary['critical_count']}, Warning: {alert_summary['warning_count']})")
for alert in alert_summary.get('active_alerts', [])[:5]:
lines.append(f" [{alert['severity'].upper()}] {alert['name']}: {alert['message']}")
lines.append("")
# Key metrics
lines.append("KEY METRICS")
lines.append("-" * 40)
for metric_name in ['cpu', 'memory', 'latency']:
stats = self.metrics.get_stats(metric_name)
if stats:
lines.append(f" {metric_name}: mean={stats['mean']:.1f}, p95={stats['p95']:.1f}")
lines.append("")
# Backups
lines.append("BACKUPS")
lines.append("-" * 40)
backup_report = self.backups.get_backup_report()
lines.append(f"Total: {backup_report['total_backups']} backups")
for job in backup_report['jobs']:
icon = "✓" if job['last_status'] == 'success' else "✗"
lines.append(f" {icon} {job['name']}: {job['last_status']}")
lines.append("")
lines.append("=" * 60)
return "\n".join(lines)
# Test
dashboard = OperationsDashboard("Quant Trading System")
# Setup health checks
dashboard.add_health_check('database', lambda: {'healthy': True, 'message': 'Connected'})
dashboard.add_health_check('broker', lambda: {'healthy': True, 'message': 'API responding'})
# Setup alert rules
dashboard.alerts.add_rule(
'high_cpu', lambda ctx: ctx.get('cpu', 0) > 80,
AlertSeverity.WARNING, 'CPU at {cpu}%'
)
# Add backup job
dashboard.add_backup_job(BackupJob(
name='database',
source='postgres',
destination='s3://backups',
schedule='0 */4 * * *',
retention_days=90
))
dashboard.backups.simulate_backup('database')
# Collect metrics
for _ in range(30):
dashboard.collect_metrics({
'cpu': random.uniform(30, 70),
'memory': random.uniform(50, 80),
'latency': random.exponential(50) + 30
})
print(dashboard.generate_dashboard())
Module Project: Complete Operations System
Build a comprehensive operations system that brings together all monitoring, alerting, and incident management components.
class TradingOperationsSystem:
"""
Complete operations system for trading infrastructure.
Features:
- Real-time metrics collection
- Health monitoring
- Alert management
- Incident tracking
- Backup management
"""
def __init__(self, name: str = "Trading Operations"):
self.name = name
self.metrics = MetricsCollector()
self.health = HealthChecker()
self.alerts = AlertManager()
self.incidents = IncidentManager()
self.backups = BackupManager()
self._setup_defaults()
def _setup_defaults(self):
"""Setup default monitoring configuration."""
# Default health checks
self.health.register_check('database', lambda: {'healthy': True, 'message': 'Connected'})
self.health.register_check('broker_api', lambda: {'healthy': True, 'message': 'Responding'})
self.health.register_check('market_data', lambda: {'healthy': True, 'message': 'Streaming'})
# Default alert rules
self.alerts.add_rule(
'high_cpu', lambda ctx: ctx.get('cpu', 0) > 80,
AlertSeverity.WARNING, 'CPU at {cpu:.1f}%'
)
self.alerts.add_rule(
'high_latency', lambda ctx: ctx.get('latency', 0) > 500,
AlertSeverity.WARNING, 'Latency at {latency:.0f}ms'
)
self.alerts.add_rule(
'large_drawdown', lambda ctx: ctx.get('drawdown', 0) > 0.10,
AlertSeverity.CRITICAL, 'Drawdown at {drawdown:.1%}'
)
# Default backup jobs
self.backups.add_job(BackupJob(
name='trade_database',
source='postgresql://localhost/trading',
destination='s3://backups/db',
schedule='0 */4 * * *',
retention_days=90
))
def collect_metrics(self, data: Dict):
"""Collect system metrics and check alerts."""
for name, value in data.items():
self.metrics.record(name, value)
self.alerts.check_rules(data)
def get_system_status(self) -> Dict:
"""Get comprehensive system status."""
health_status = self.health.get_status()
alert_summary = self.alerts.get_alert_summary()
# Determine overall status
if alert_summary['critical_count'] > 0:
overall = 'critical'
elif alert_summary['warning_count'] > 0 or health_status['status'] != 'healthy':
overall = 'warning'
else:
overall = 'healthy'
return {
'timestamp': datetime.now().isoformat(),
'overall_status': overall,
'health': health_status,
'alerts': alert_summary,
'backup_status': self.backups.get_backup_report()
}
def generate_dashboard(self) -> str:
"""Generate text-based dashboard."""
status = self.get_system_status()
emoji = {'healthy': '🟢', 'warning': '🟡', 'critical': '🔴'}
lines = [
"=" * 60,
f"OPERATIONS DASHBOARD: {self.name}",
f"Time: {status['timestamp']}",
"=" * 60,
"",
f"OVERALL: {emoji.get(status['overall_status'], '⚪')} {status['overall_status'].upper()}",
"",
"HEALTH CHECKS:"
]
for name, result in status['health'].get('checks', {}).items():
icon = "✓" if result['healthy'] else "✗"
lines.append(f" {icon} {name}: {result['message']}")
lines.append("\nALERTS:")
alerts = status['alerts']
lines.append(f" Active: {alerts['active_count']} (Critical: {alerts['critical_count']}, Warning: {alerts['warning_count']})")
lines.append("\nBACKUPS:")
for job in status['backup_status']['jobs']:
icon = "✓" if job['last_status'] == 'success' else "✗"
lines.append(f" {icon} {job['name']}: {job['last_status']}")
lines.append("\n" + "=" * 60)
return "\n".join(lines)
# Create and test system
ops = TradingOperationsSystem("Quant Trading Platform")
# Simulate operations
for _ in range(30):
ops.collect_metrics({
'cpu': random.uniform(30, 70),
'memory': random.uniform(50, 80),
'latency': random.exponential(50) + 30,
'drawdown': random.uniform(0, 0.05)
})
# Run backup
ops.backups.simulate_backup('trade_database')
# Display dashboard
print(ops.generate_dashboard())
Key Takeaways
System Monitoring
- Monitor the Four Golden Signals: latency, traffic, errors, saturation
- Health checks should be fast and deterministic
- Track both infrastructure and business metrics
Alerting
- Every alert should be actionable
- Use appropriate severity levels
- Implement cooldowns to prevent alert storms
- Route critical alerts to on-call systems
Incident Response
- Have runbooks for common issues
- Document everything in the incident timeline
- Conduct post-mortems to prevent recurrence
- Focus on "how to prevent" not "who to blame"
Backup & Recovery
- Define RPO (Recovery Point Objective) and RTO (Recovery Time Objective)
- Test restores regularly
- Encrypt backups at rest and in transit
- Automate backup verification
Best Practices
- Design for failure - everything will break eventually
- Automate repetitive operational tasks
- Keep runbooks up to date
- Practice incident response with game days
- Monitor your monitoring (meta-monitoring)
Congratulations on completing all modules! Now proceed to the Capstone Project to bring everything together.
Capstone Project: Complete Quantitative Trading System
Course 3: Quantitative Finance & Portfolio Theory
| Duration | ~4-5 hours |
| Exercises | 6 (3 guided + 3 open-ended) |
Project Overview
You will build a complete quantitative trading system that integrates:
- Data Pipeline - Market data collection and storage
- Strategy Engine - Multi-strategy portfolio management
- Risk Management - Real-time risk monitoring and limits
- Execution - Order management with cost awareness
- Dashboard & Reporting - Performance visualization and reports
Learning Integration
This project draws from every module in the course:
| Component | Modules Used |
|---|---|
| Portfolio Optimization | 4, 5, 6 |
| Risk Management | 7, 8, 9 |
| Simulation & Analysis | 10, 11 |
| Dashboard & Reporting | 12, 13 |
| Execution | 14, 15 |
| Infrastructure | 16, 17, 18 |
# Environment setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Callable
from enum import Enum
from scipy.optimize import minimize
import json
import warnings
warnings.filterwarnings('ignore')
# Display settings
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
print("Capstone Project: Complete Quantitative Trading System")
print("=" * 55)
Part 1: Data Pipeline
Build a data pipeline that: - Fetches market data from multiple sources - Calculates derived metrics (returns, volatility, etc.) - Stores data efficiently
class DataPipeline:
"""
Market data pipeline for the trading system.
Responsibilities:
- Fetch historical and live market data
- Calculate returns and risk metrics
- Provide data to other system components
"""
def __init__(self, universe: List[str]):
self.universe = universe
self.prices = pd.DataFrame()
self.returns = pd.DataFrame()
self.metadata = {}
self.last_update = None
def fetch_historical_data(self, start_date: str, end_date: str = None) -> pd.DataFrame:
"""Fetch historical price data."""
end_date = end_date or datetime.now().strftime('%Y-%m-%d')
print(f"Fetching data for {len(self.universe)} symbols...")
data = yf.download(self.universe, start=start_date, end=end_date, progress=False)
# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
if 'Adj Close' in data.columns.get_level_values(0):
self.prices = data['Adj Close']
elif 'Close' in data.columns.get_level_values(0):
self.prices = data['Close']
else:
self.prices = data
# Ensure proper column order
self.prices = self.prices[self.universe]
# Calculate returns
self.returns = self.prices.pct_change().dropna()
self.last_update = datetime.now()
print(f"Loaded {len(self.prices)} days of data")
return self.prices
def calculate_metrics(self, lookback_days: int = 252) -> Dict:
"""Calculate key metrics for all assets."""
if self.returns.empty:
raise ValueError("No data loaded. Call fetch_historical_data first.")
recent_returns = self.returns.tail(lookback_days)
metrics = {}
for symbol in self.universe:
ret = recent_returns[symbol]
metrics[symbol] = {
'annual_return': ret.mean() * 252,
'annual_volatility': ret.std() * np.sqrt(252),
'sharpe_ratio': (ret.mean() * 252) / (ret.std() * np.sqrt(252)),
'max_drawdown': self._calculate_max_drawdown(self.prices[symbol].tail(lookback_days)),
'current_price': self.prices[symbol].iloc[-1]
}
return metrics
def _calculate_max_drawdown(self, prices: pd.Series) -> float:
"""Calculate maximum drawdown."""
cummax = prices.cummax()
drawdown = (prices - cummax) / cummax
return drawdown.min()
def get_correlation_matrix(self, lookback_days: int = 252) -> pd.DataFrame:
"""Get correlation matrix."""
return self.returns.tail(lookback_days).corr()
def get_covariance_matrix(self, lookback_days: int = 252, annualize: bool = True) -> pd.DataFrame:
"""Get covariance matrix."""
cov = self.returns.tail(lookback_days).cov()
if annualize:
cov = cov * 252
return cov
def get_latest_prices(self) -> pd.Series:
"""Get most recent prices."""
return self.prices.iloc[-1]
# Initialize data pipeline
UNIVERSE = ['SPY', 'QQQ', 'IWM', 'EFA', 'EEM', 'TLT', 'GLD', 'VNQ']
pipeline = DataPipeline(UNIVERSE)
pipeline.fetch_historical_data('2020-01-01')
# Display metrics
metrics = pipeline.calculate_metrics()
print("\nAsset Metrics:")
print("=" * 70)
metrics_df = pd.DataFrame(metrics).T
print(metrics_df.to_string())
Exercise C.1: Data Quality Validator (Guided)
Add data quality validation to the data pipeline to check for missing values and outliers.
# Exercise C.1: Data Quality Validator (Guided)
def validate_data_quality(prices: pd.DataFrame, returns: pd.DataFrame) -> Dict:
"""
Validate data quality and identify issues.
Args:
prices: Price DataFrame
returns: Returns DataFrame
Returns:
Validation results with quality metrics
"""
results = {
'valid': True,
'issues': [],
'metrics': {}
}
# Check for missing values in prices
# TODO: Count missing values per column
missing_counts = prices.______().______()
results['metrics']['missing_values'] = missing_counts.to_dict()
# TODO: Check if any column has missing values
if missing_counts.______() > 0:
results['issues'].append('Missing values detected')
# Check for outliers in returns (>5 standard deviations)
outlier_threshold = 5
outlier_counts = {}
for col in returns.columns:
# TODO: Calculate mean and standard deviation
mean = returns[col].______()
std = returns[col].______()
# TODO: Count outliers beyond threshold
outliers = ((returns[col] - mean).______() > outlier_threshold * std).sum()
outlier_counts[col] = ______
results['metrics']['outlier_counts'] = outlier_counts
# TODO: Check if total outliers exceed threshold
total_outliers = ______(outlier_counts.values())
if total_outliers > len(returns) * 0.01: # More than 1% outliers
results['issues'].______(f'High outlier count: {total_outliers}')
# Check data coverage
# TODO: Calculate trading days
trading_days = ______(prices)
results['metrics']['trading_days'] = trading_days
# Set overall validity
results['valid'] = len(results['issues']) == 0
return results
# Test
validation = validate_data_quality(pipeline.prices, pipeline.returns)
print(f"Data Valid: {validation['valid']}")
print(f"Trading Days: {validation['metrics']['trading_days']}")
print(f"Issues: {validation['issues']}")
Click for solution
def validate_data_quality(prices: pd.DataFrame, returns: pd.DataFrame) -> Dict:
"""
Validate data quality and identify issues.
Args:
prices: Price DataFrame
returns: Returns DataFrame
Returns:
Validation results with quality metrics
"""
results = {
'valid': True,
'issues': [],
'metrics': {}
}
missing_counts = prices.isna().sum()
results['metrics']['missing_values'] = missing_counts.to_dict()
if missing_counts.sum() > 0:
results['issues'].append('Missing values detected')
outlier_threshold = 5
outlier_counts = {}
for col in returns.columns:
mean = returns[col].mean()
std = returns[col].std()
outliers = ((returns[col] - mean).abs() > outlier_threshold * std).sum()
outlier_counts[col] = outliers
results['metrics']['outlier_counts'] = outlier_counts
total_outliers = sum(outlier_counts.values())
if total_outliers > len(returns) * 0.01:
results['issues'].append(f'High outlier count: {total_outliers}')
trading_days = len(prices)
results['metrics']['trading_days'] = trading_days
results['valid'] = len(results['issues']) == 0
return results
Part 2: Strategy Engine
Build a multi-strategy engine that: - Implements multiple portfolio optimization strategies - Manages strategy weights and allocation - Generates trading signals
class Strategy:
"""Base class for trading strategies."""
def __init__(self, name: str):
self.name = name
def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
"""Calculate target weights. Override in subclass."""
raise NotImplementedError
class MeanVarianceStrategy(Strategy):
"""Mean-Variance Optimization Strategy."""
def __init__(self, target_return: float = 0.10):
super().__init__("Mean-Variance")
self.target_return = target_return
def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
returns = data.returns
n_assets = len(data.universe)
mu = returns.mean().values * 252
cov = returns.cov().values * 252
def objective(w):
return w @ cov @ w
constraints = [
{'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
{'type': 'ineq', 'fun': lambda w: w @ mu - self.target_return}
]
bounds = [(0, 0.3) for _ in range(n_assets)]
result = minimize(objective, np.ones(n_assets)/n_assets,
method='SLSQP', bounds=bounds, constraints=constraints)
return dict(zip(data.universe, result.x))
class RiskParityStrategy(Strategy):
"""Risk Parity Strategy."""
def __init__(self):
super().__init__("Risk Parity")
def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
returns = data.returns
n_assets = len(data.universe)
cov = returns.cov().values * 252
target_risk = 1 / n_assets
def objective(w):
port_vol = np.sqrt(w @ cov @ w)
marginal_contrib = cov @ w
risk_contrib = w * marginal_contrib / port_vol
return np.sum((risk_contrib - target_risk)**2)
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(0.01, 0.5) for _ in range(n_assets)]
result = minimize(objective, np.ones(n_assets)/n_assets,
method='SLSQP', bounds=bounds, constraints=constraints)
return dict(zip(data.universe, result.x))
class MinimumVolatilityStrategy(Strategy):
"""Minimum Volatility Strategy."""
def __init__(self):
super().__init__("Minimum Volatility")
def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
returns = data.returns
n_assets = len(data.universe)
cov = returns.cov().values * 252
def objective(w):
return np.sqrt(w @ cov @ w)
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(0, 0.4) for _ in range(n_assets)]
result = minimize(objective, np.ones(n_assets)/n_assets,
method='SLSQP', bounds=bounds, constraints=constraints)
return dict(zip(data.universe, result.x))
class StrategyEngine:
"""
Multi-strategy portfolio engine.
"""
def __init__(self, data_pipeline: DataPipeline):
self.data = data_pipeline
self.strategies: Dict[str, Strategy] = {}
self.strategy_allocations: Dict[str, float] = {}
self.combined_weights: Dict[str, float] = {}
def add_strategy(self, strategy: Strategy, allocation: float):
"""Add a strategy with given allocation."""
self.strategies[strategy.name] = strategy
self.strategy_allocations[strategy.name] = allocation
def calculate_all_weights(self) -> Dict[str, Dict[str, float]]:
"""Calculate weights for all strategies."""
all_weights = {}
for name, strategy in self.strategies.items():
try:
weights = strategy.calculate_weights(self.data)
all_weights[name] = weights
except Exception as e:
print(f"Error calculating {name} weights: {e}")
n = len(self.data.universe)
all_weights[name] = {s: 1/n for s in self.data.universe}
return all_weights
def combine_weights(self) -> Dict[str, float]:
"""Combine strategy weights based on allocations."""
all_weights = self.calculate_all_weights()
combined = {symbol: 0.0 for symbol in self.data.universe}
for strategy_name, strategy_weights in all_weights.items():
allocation = self.strategy_allocations.get(strategy_name, 0)
for symbol, weight in strategy_weights.items():
combined[symbol] += weight * allocation
total = sum(combined.values())
if total > 0:
combined = {s: w/total for s, w in combined.items()}
self.combined_weights = combined
return combined
def get_strategy_comparison(self) -> pd.DataFrame:
"""Get comparison of all strategy weights."""
all_weights = self.calculate_all_weights()
all_weights['Combined'] = self.combine_weights()
return pd.DataFrame(all_weights)
# Create strategy engine
engine = StrategyEngine(pipeline)
# Add strategies with allocations
engine.add_strategy(MeanVarianceStrategy(target_return=0.08), 0.40)
engine.add_strategy(RiskParityStrategy(), 0.35)
engine.add_strategy(MinimumVolatilityStrategy(), 0.25)
# Calculate and display weights
comparison = engine.get_strategy_comparison()
print("\nStrategy Weight Comparison:")
print("=" * 60)
print(comparison.round(3).to_string())
Exercise C.2: Strategy Performance Tracker (Guided)
Build a function that tracks and compares historical performance of each strategy.
# Exercise C.2: Strategy Performance Tracker (Guided)
def calculate_strategy_performance(returns: pd.DataFrame,
strategy_weights: Dict[str, Dict[str, float]]) -> Dict:
"""
Calculate historical performance for each strategy.
Args:
returns: Asset returns DataFrame
strategy_weights: Dict of {strategy_name: {symbol: weight}}
Returns:
Performance metrics for each strategy
"""
performance = {}
for strategy_name, weights in strategy_weights.items():
# TODO: Convert weights to array in same order as returns columns
weight_array = np.array([weights.______(col, 0) for col in returns.______])
# TODO: Calculate portfolio returns
port_returns = (returns * ______).sum(axis=1)
# TODO: Calculate cumulative returns
cumulative = (1 + port_returns).______()
# Calculate drawdown
running_max = cumulative.cummax()
drawdown = (cumulative - running_max) / running_max
# TODO: Calculate annual return
annual_return = port_returns.______() * 252
# TODO: Calculate annual volatility
annual_vol = port_returns.______() * np.sqrt(252)
performance[strategy_name] = {
# TODO: Calculate total return
'total_return': cumulative.______[-1] - 1,
'annual_return': annual_return,
'annual_volatility': annual_vol,
# TODO: Calculate Sharpe ratio
'sharpe_ratio': annual_return / ______ if annual_vol > 0 else 0,
'max_drawdown': drawdown.min()
}
return performance
# Test
strategy_weights = engine.calculate_all_weights()
perf = calculate_strategy_performance(pipeline.returns, strategy_weights)
print("Strategy Performance:")
print("=" * 60)
for name, metrics in perf.items():
print(f"\n{name}:")
print(f" Total Return: {metrics['total_return']:.2%}")
print(f" Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
print(f" Max Drawdown: {metrics['max_drawdown']:.2%}")
Click for solution
def calculate_strategy_performance(returns: pd.DataFrame,
strategy_weights: Dict[str, Dict[str, float]]) -> Dict:
"""
Calculate historical performance for each strategy.
Args:
returns: Asset returns DataFrame
strategy_weights: Dict of {strategy_name: {symbol: weight}}
Returns:
Performance metrics for each strategy
"""
performance = {}
for strategy_name, weights in strategy_weights.items():
weight_array = np.array([weights.get(col, 0) for col in returns.columns])
port_returns = (returns * weight_array).sum(axis=1)
cumulative = (1 + port_returns).cumprod()
running_max = cumulative.cummax()
drawdown = (cumulative - running_max) / running_max
annual_return = port_returns.mean() * 252
annual_vol = port_returns.std() * np.sqrt(252)
performance[strategy_name] = {
'total_return': cumulative.iloc[-1] - 1,
'annual_return': annual_return,
'annual_volatility': annual_vol,
'sharpe_ratio': annual_return / annual_vol if annual_vol > 0 else 0,
'max_drawdown': drawdown.min()
}
return performance
Part 3: Risk Management
Build a risk management system that: - Monitors portfolio risk in real-time - Enforces risk limits - Calculates VaR and other risk metrics
@dataclass
class RiskLimits:
"""Risk limit configuration."""
max_position_size: float = 0.25
max_sector_exposure: float = 0.40
max_portfolio_var: float = 0.02
max_drawdown: float = 0.15
min_cash: float = 0.05
class RiskManager:
"""
Portfolio risk management system.
"""
def __init__(self, data_pipeline: DataPipeline, limits: RiskLimits = None):
self.data = data_pipeline
self.limits = limits or RiskLimits()
self.alerts = []
def calculate_portfolio_var(self, weights: Dict[str, float],
confidence: float = 0.95,
method: str = 'historical') -> float:
"""Calculate portfolio Value at Risk."""
weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
port_returns = (self.data.returns * weight_array).sum(axis=1)
if method == 'historical':
var = -np.percentile(port_returns, (1 - confidence) * 100)
elif method == 'parametric':
from scipy.stats import norm
mu = port_returns.mean()
sigma = port_returns.std()
var = -(mu + norm.ppf(1 - confidence) * sigma)
else:
raise ValueError(f"Unknown method: {method}")
return var
def calculate_portfolio_cvar(self, weights: Dict[str, float],
confidence: float = 0.95) -> float:
"""Calculate Conditional Value at Risk."""
weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
port_returns = (self.data.returns * weight_array).sum(axis=1)
var = self.calculate_portfolio_var(weights, confidence)
cvar = -port_returns[port_returns <= -var].mean()
return cvar
def calculate_risk_metrics(self, weights: Dict[str, float]) -> Dict:
"""Calculate comprehensive risk metrics."""
weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
port_returns = (self.data.returns * weight_array).sum(axis=1)
cov = self.data.get_covariance_matrix()
port_vol = np.sqrt(weight_array @ cov.values @ weight_array)
port_return = (self.data.returns.mean() * weight_array).sum() * 252
cumulative = (1 + port_returns).cumprod()
running_max = cumulative.cummax()
drawdown = (cumulative - running_max) / running_max
max_drawdown = drawdown.min()
return {
'annual_return': port_return,
'annual_volatility': port_vol,
'sharpe_ratio': port_return / port_vol if port_vol > 0 else 0,
'var_95': self.calculate_portfolio_var(weights, 0.95),
'cvar_95': self.calculate_portfolio_cvar(weights, 0.95),
'max_drawdown': max_drawdown,
'current_drawdown': drawdown.iloc[-1]
}
def check_limits(self, weights: Dict[str, float]) -> List[Dict]:
"""Check if weights violate any risk limits."""
violations = []
for symbol, weight in weights.items():
if weight > self.limits.max_position_size:
violations.append({
'type': 'position_size',
'symbol': symbol,
'value': weight,
'limit': self.limits.max_position_size,
'message': f"{symbol} weight {weight:.1%} exceeds limit {self.limits.max_position_size:.1%}"
})
var = self.calculate_portfolio_var(weights, 0.95)
if var > self.limits.max_portfolio_var:
violations.append({
'type': 'var',
'value': var,
'limit': self.limits.max_portfolio_var,
'message': f"Portfolio VaR {var:.2%} exceeds limit {self.limits.max_portfolio_var:.2%}"
})
self.alerts = violations
return violations
def get_risk_report(self, weights: Dict[str, float]) -> str:
"""Generate risk report."""
metrics = self.calculate_risk_metrics(weights)
violations = self.check_limits(weights)
report = []
report.append("="*50)
report.append("RISK MANAGEMENT REPORT")
report.append("="*50)
report.append("")
report.append("Portfolio Risk Metrics:")
report.append("-"*30)
report.append(f" Expected Return: {metrics['annual_return']:.2%}")
report.append(f" Volatility: {metrics['annual_volatility']:.2%}")
report.append(f" Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
report.append(f" VaR (95%): {metrics['var_95']:.2%}")
report.append(f" Max Drawdown: {metrics['max_drawdown']:.2%}")
report.append("")
report.append("Limit Violations:")
report.append("-"*30)
if violations:
for v in violations:
report.append(f" ⚠️ {v['message']}")
else:
report.append(" ✓ All limits within bounds")
report.append("")
report.append("="*50)
return "\n".join(report)
# Create risk manager
risk_manager = RiskManager(pipeline)
# Get combined weights
combined_weights = engine.combine_weights()
# Generate risk report
print(risk_manager.get_risk_report(combined_weights))
Exercise C.3: Dynamic Risk Adjustment (Guided)
Create a function that adjusts portfolio weights to respect risk limits.
# Exercise C.3: Dynamic Risk Adjustment (Guided)
def adjust_weights_for_risk(weights: Dict[str, float],
risk_manager: RiskManager,
max_iterations: int = 10) -> Dict:
"""
Adjust weights to meet risk limits.
Args:
weights: Target portfolio weights
risk_manager: RiskManager instance
max_iterations: Max adjustment iterations
Returns:
Adjusted weights and adjustment info
"""
# TODO: Create copy of weights
adjusted = weights.______()
adjustments_made = []
for iteration in range(max_iterations):
# TODO: Check current limits
violations = risk_manager.______(adjusted)
# TODO: Exit if no violations
if not ______:
break
for violation in violations:
if violation['type'] == 'position_size':
symbol = violation['symbol']
# TODO: Get limit from violation
limit = violation[______]
# Calculate excess
excess = adjusted[symbol] - limit
# TODO: Cap at limit
adjusted[symbol] = ______
# Redistribute excess to other positions
other_symbols = [s for s in adjusted if s != symbol]
# TODO: Calculate redistribution per symbol
per_symbol = excess / ______(other_symbols)
for s in other_symbols:
adjusted[s] += per_symbol
adjustments_made.append(f"Capped {symbol} at {limit:.1%}")
elif violation['type'] == 'var':
# Scale down all positions
scale_factor = 0.9
for symbol in adjusted:
# TODO: Scale down each weight
adjusted[symbol] ______ scale_factor
adjustments_made.append(f"Scaled down by {1-scale_factor:.1%}")
# Normalize
total = sum(adjusted.values())
if total > 0:
adjusted = {s: w/total for s, w in adjusted.items()}
return {
'original_weights': weights,
'adjusted_weights': adjusted,
'adjustments': adjustments_made,
'iterations': iteration + 1
}
# Test
result = adjust_weights_for_risk(combined_weights, risk_manager)
print(f"Adjustments made: {len(result['adjustments'])}")
print(f"Iterations: {result['iterations']}")
if result['adjustments']:
for adj in result['adjustments']:
print(f" - {adj}")
Click for solution
def adjust_weights_for_risk(weights: Dict[str, float],
risk_manager: RiskManager,
max_iterations: int = 10) -> Dict:
"""
Adjust weights to meet risk limits.
Args:
weights: Target portfolio weights
risk_manager: RiskManager instance
max_iterations: Max adjustment iterations
Returns:
Adjusted weights and adjustment info
"""
adjusted = weights.copy()
adjustments_made = []
for iteration in range(max_iterations):
violations = risk_manager.check_limits(adjusted)
if not violations:
break
for violation in violations:
if violation['type'] == 'position_size':
symbol = violation['symbol']
limit = violation['limit']
excess = adjusted[symbol] - limit
adjusted[symbol] = limit
other_symbols = [s for s in adjusted if s != symbol]
per_symbol = excess / len(other_symbols)
for s in other_symbols:
adjusted[s] += per_symbol
adjustments_made.append(f"Capped {symbol} at {limit:.1%}")
elif violation['type'] == 'var':
scale_factor = 0.9
for symbol in adjusted:
adjusted[symbol] *= scale_factor
adjustments_made.append(f"Scaled down by {1-scale_factor:.1%}")
total = sum(adjusted.values())
if total > 0:
adjusted = {s: w/total for s, w in adjusted.items()}
return {
'original_weights': weights,
'adjusted_weights': adjusted,
'adjustments': adjustments_made,
'iterations': iteration + 1
}
Part 4: Execution Engine
Build an execution system that: - Calculates required trades to reach target weights - Estimates transaction costs - Manages order generation
@dataclass
class Trade:
"""Represents a trade order."""
symbol: str
side: str
quantity: int
price: float
value: float
reason: str = ""
class ExecutionEngine:
"""
Trade execution engine.
"""
def __init__(self, data_pipeline: DataPipeline,
transaction_cost_bps: float = 10,
min_trade_value: float = 1000):
self.data = data_pipeline
self.transaction_cost_bps = transaction_cost_bps
self.min_trade_value = min_trade_value
self.pending_trades: List[Trade] = []
def calculate_trades(self, current_holdings: Dict[str, int],
target_weights: Dict[str, float],
portfolio_value: float) -> List[Trade]:
"""Calculate trades to reach target weights."""
prices = self.data.get_latest_prices()
trades = []
for symbol in target_weights:
current_shares = current_holdings.get(symbol, 0)
current_value = current_shares * prices[symbol]
current_weight = current_value / portfolio_value if portfolio_value > 0 else 0
target_weight = target_weights.get(symbol, 0)
target_value = target_weight * portfolio_value
target_shares = int(target_value / prices[symbol])
shares_diff = target_shares - current_shares
trade_value = abs(shares_diff * prices[symbol])
if trade_value >= self.min_trade_value:
trade = Trade(
symbol=symbol,
side='buy' if shares_diff > 0 else 'sell',
quantity=abs(shares_diff),
price=prices[symbol],
value=trade_value,
reason=f"Rebalance: {current_weight:.1%} -> {target_weight:.1%}"
)
trades.append(trade)
trades.sort(key=lambda t: (t.side == 'buy', -t.value))
self.pending_trades = trades
return trades
def estimate_costs(self, trades: List[Trade] = None) -> Dict:
"""Estimate transaction costs."""
trades = trades or self.pending_trades
total_value = sum(t.value for t in trades)
total_cost = total_value * (self.transaction_cost_bps / 10000)
return {
'num_trades': len(trades),
'total_value': total_value,
'estimated_cost': total_cost,
'cost_bps': total_cost / total_value * 10000 if total_value > 0 else 0
}
def get_trade_summary(self) -> pd.DataFrame:
"""Get summary of pending trades."""
if not self.pending_trades:
return pd.DataFrame()
return pd.DataFrame([
{
'Symbol': t.symbol,
'Side': t.side.upper(),
'Quantity': t.quantity,
'Price': t.price,
'Value': t.value
}
for t in self.pending_trades
])
# Create execution engine
execution = ExecutionEngine(pipeline, transaction_cost_bps=10)
# Simulate current holdings
portfolio_value = 1_000_000
prices = pipeline.get_latest_prices()
initial_weight = 1 / len(UNIVERSE)
current_holdings = {}
for symbol in UNIVERSE:
target_value = portfolio_value * initial_weight
current_holdings[symbol] = int(target_value / prices[symbol])
# Calculate trades
trades = execution.calculate_trades(current_holdings, combined_weights, portfolio_value)
print("\nTrade Summary:")
print("=" * 60)
print(execution.get_trade_summary().to_string(index=False))
print("\nCost Estimate:")
costs = execution.estimate_costs()
print(f" Total value: ${costs['total_value']:,.0f}")
print(f" Estimated cost: ${costs['estimated_cost']:,.2f}")
Exercise C.4: Complete Trading System (Open-ended)
Integrate all components into a unified trading system class.
# Exercise C.4: Complete Trading System (Open-ended)
#
# Build a TradingSystem class that:
# - Integrates data pipeline, strategy engine, risk manager, and execution engine
# - Implements a run_cycle() method that:
# 1. Updates market data
# 2. Calculates target weights from strategies
# 3. Checks risk limits and adjusts if needed
# 4. Generates trades and estimates costs
# - Tracks portfolio state (holdings, value, PnL)
# - Generates comprehensive reports
#
# Your implementation:
Click for solution
class TradingSystem:
"""
Complete quantitative trading system.
"""
def __init__(self, name: str, universe: List[str]):
self.name = name
self.universe = universe
# Components
self.data = DataPipeline(universe)
self.strategies = StrategyEngine(self.data)
self.risk = RiskManager(self.data)
self.execution = ExecutionEngine(self.data)
# State
self.holdings: Dict[str, int] = {}
self.portfolio_value = 0
self.target_weights: Dict[str, float] = {}
self.is_initialized = False
def initialize(self, start_date: str, initial_capital: float):
"""Initialize the trading system."""
print(f"Initializing {self.name}...")
self.data.fetch_historical_data(start_date)
self.portfolio_value = initial_capital
self.holdings = {symbol: 0 for symbol in self.universe}
self.is_initialized = True
print(f"System initialized with ${initial_capital:,.0f}")
def add_strategy(self, strategy: Strategy, allocation: float):
"""Add a strategy to the system."""
self.strategies.add_strategy(strategy, allocation)
def run_cycle(self) -> Dict:
"""Run one trading cycle."""
if not self.is_initialized:
raise RuntimeError("System not initialized")
result = {
'timestamp': datetime.now().isoformat(),
'status': 'success',
'actions': []
}
# Calculate target weights
self.target_weights = self.strategies.combine_weights()
result['target_weights'] = self.target_weights.copy()
result['actions'].append('Calculated target weights')
# Check risk limits
violations = self.risk.check_limits(self.target_weights)
result['risk_violations'] = len(violations)
if violations:
result['actions'].append(f'Found {len(violations)} risk violations')
result['status'] = 'risk_alert'
else:
result['actions'].append('Risk limits OK')
# Calculate trades
trades = self.execution.calculate_trades(
self.holdings,
self.target_weights,
self.portfolio_value
)
result['num_trades'] = len(trades)
result['actions'].append(f'Generated {len(trades)} trades')
# Estimate costs
result['estimated_costs'] = self.execution.estimate_costs()
# Risk metrics
result['risk_metrics'] = self.risk.calculate_risk_metrics(self.target_weights)
return result
def generate_report(self) -> str:
"""Generate comprehensive system report."""
report = []
report.append("="*60)
report.append(f"TRADING SYSTEM REPORT: {self.name}")
report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
report.append("="*60)
report.append("")
report.append("PORTFOLIO SUMMARY")
report.append("-"*40)
report.append(f"Total Value: ${self.portfolio_value:,.0f}")
report.append(f"Universe: {len(self.universe)} assets")
report.append(f"Strategies: {len(self.strategies.strategies)}")
report.append("")
report.append("TARGET ALLOCATION")
report.append("-"*40)
for symbol, weight in sorted(self.target_weights.items(), key=lambda x: -x[1]):
report.append(f" {symbol}: {weight:.1%}")
report.append("")
if self.target_weights:
metrics = self.risk.calculate_risk_metrics(self.target_weights)
report.append("RISK METRICS")
report.append("-"*40)
report.append(f" Expected Return: {metrics['annual_return']:.2%}")
report.append(f" Volatility: {metrics['annual_volatility']:.2%}")
report.append(f" Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
report.append(f" VaR (95%): {metrics['var_95']:.2%}")
report.append("")
report.append("="*60)
return "\n".join(report)
# Test
system = TradingSystem("Capstone System", UNIVERSE)
system.initialize('2020-01-01', 1_000_000)
system.add_strategy(MeanVarianceStrategy(target_return=0.08), 0.40)
system.add_strategy(RiskParityStrategy(), 0.35)
system.add_strategy(MinimumVolatilityStrategy(), 0.25)
result = system.run_cycle()
print(f"Status: {result['status']}")
print(f"Trades: {result['num_trades']}")
print(system.generate_report())
Exercise C.5: Performance Dashboard (Open-ended)
Create a performance visualization dashboard for the trading system.
# Exercise C.5: Performance Dashboard (Open-ended)
#
# Build a function that creates a 4-panel visualization:
# - Panel 1: Cumulative returns (portfolio vs benchmark)
# - Panel 2: Drawdown chart
# - Panel 3: Portfolio allocation pie chart
# - Panel 4: Rolling Sharpe ratio
#
# The function should also print summary statistics:
# - Total return, alpha, max drawdown, Sharpe ratio
#
# Your implementation:
Click for solution
def create_performance_dashboard(returns: pd.DataFrame,
weights: Dict[str, float],
benchmark_symbol: str = 'SPY'):
"""
Create a comprehensive performance dashboard.
Args:
returns: Asset returns DataFrame
weights: Portfolio weights
benchmark_symbol: Benchmark ticker
"""
# Calculate portfolio returns
weight_array = np.array([weights.get(s, 0) for s in returns.columns])
port_returns = (returns * weight_array).sum(axis=1)
port_cumulative = (1 + port_returns).cumprod()
# Benchmark returns
bench_returns = returns[benchmark_symbol]
bench_cumulative = (1 + bench_returns).cumprod()
# Drawdown
running_max = port_cumulative.cummax()
drawdown = (port_cumulative - running_max) / running_max * 100
# Rolling Sharpe
rolling_sharpe = port_returns.rolling(63).mean() / port_returns.rolling(63).std() * np.sqrt(252)
# Create figure
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Panel 1: Cumulative returns
axes[0, 0].plot(port_cumulative.index, port_cumulative, label='Portfolio', linewidth=2)
axes[0, 0].plot(bench_cumulative.index, bench_cumulative, label=benchmark_symbol, linewidth=2, alpha=0.7)
axes[0, 0].set_title('Cumulative Returns')
axes[0, 0].set_ylabel('Growth of $1')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Panel 2: Drawdown
axes[0, 1].fill_between(drawdown.index, 0, drawdown, alpha=0.7, color='red')
axes[0, 1].set_title('Drawdown')
axes[0, 1].set_ylabel('Drawdown (%)')
axes[0, 1].grid(True, alpha=0.3)
# Panel 3: Allocation
sorted_weights = dict(sorted(weights.items(), key=lambda x: -x[1]))
colors = plt.cm.Set3(np.linspace(0, 1, len(sorted_weights)))
axes[1, 0].pie(sorted_weights.values(), labels=sorted_weights.keys(),
autopct='%1.1f%%', colors=colors)
axes[1, 0].set_title('Target Allocation')
# Panel 4: Rolling Sharpe
axes[1, 1].plot(rolling_sharpe.index, rolling_sharpe, linewidth=1)
axes[1, 1].axhline(rolling_sharpe.mean(), color='red', linestyle='--',
label=f'Mean: {rolling_sharpe.mean():.2f}')
axes[1, 1].set_title('Rolling Sharpe Ratio (63-day)')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Print summary
print("\nPerformance Summary:")
print("="*50)
print(f"Total Return: {(port_cumulative.iloc[-1] - 1):.2%}")
print(f"Benchmark Return: {(bench_cumulative.iloc[-1] - 1):.2%}")
print(f"Alpha: {(port_cumulative.iloc[-1] - bench_cumulative.iloc[-1]):.2%}")
print(f"Max Drawdown: {drawdown.min():.2%}")
print(f"Sharpe Ratio: {port_returns.mean() / port_returns.std() * np.sqrt(252):.2f}")
# Test
create_performance_dashboard(pipeline.returns, combined_weights)
Exercise C.6: System Extensions (Open-ended)
Extend the trading system with additional features.
# Exercise C.6: System Extensions (Open-ended)
#
# Choose ONE of the following extensions to implement:
#
# Option A: Momentum Strategy
# - Implement a momentum strategy that ranks assets by recent performance
# - Weight assets based on momentum score
# - Add to the strategy engine
#
# Option B: Portfolio Rebalancing Logic
# - Implement a rebalancing trigger (drift-based or calendar-based)
# - Only generate trades when rebalancing is triggered
# - Track rebalancing history
#
# Option C: Performance Attribution
# - Implement Brinson attribution (allocation + selection effects)
# - Break down performance by asset contribution
# - Generate attribution reports
#
# Your implementation:
Click for solution (Option A: Momentum Strategy)
class MomentumStrategy(Strategy):
"""
Momentum-based strategy.
Ranks assets by recent performance and overweights winners.
"""
def __init__(self, lookback_days: int = 252, top_n: int = 4):
super().__init__("Momentum")
self.lookback_days = lookback_days
self.top_n = top_n
def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
# Calculate momentum scores (total return over lookback)
recent_prices = data.prices.tail(self.lookback_days)
momentum = (recent_prices.iloc[-1] / recent_prices.iloc[0]) - 1
# Rank and select top performers
rankings = momentum.rank(ascending=False)
# Weight based on momentum score
weights = {}
total_momentum = 0
for symbol in data.universe:
if rankings[symbol] <= self.top_n:
# Only invest in top N performers
score = max(momentum[symbol], 0.01) # Floor at small positive
weights[symbol] = score
total_momentum += score
else:
weights[symbol] = 0
# Normalize weights
if total_momentum > 0:
weights = {s: w / total_momentum for s, w in weights.items()}
else:
# Equal weight fallback
n = len(data.universe)
weights = {s: 1/n for s in data.universe}
return weights
# Test the momentum strategy
momentum_strat = MomentumStrategy(lookback_days=126, top_n=4)
momentum_weights = momentum_strat.calculate_weights(pipeline)
print("Momentum Strategy Weights:")
print("="*40)
for symbol, weight in sorted(momentum_weights.items(), key=lambda x: -x[1]):
if weight > 0:
print(f" {symbol}: {weight:.1%}")
# Add to strategy engine
engine.add_strategy(momentum_strat, 0.20) # 20% allocation
Capstone Completion Checklist
Core Components
- [ ] Data pipeline fetches and processes market data
- [ ] Multiple strategies implemented and combined
- [ ] Risk management with VaR, limits, and alerts
- [ ] Execution engine calculates trades and costs
- [ ] System integration with run cycle
Exercises Completed
- [ ] C.1: Data Quality Validator
- [ ] C.2: Strategy Performance Tracker
- [ ] C.3: Dynamic Risk Adjustment
- [ ] C.4: Complete Trading System
- [ ] C.5: Performance Dashboard
- [ ] C.6: System Extensions
Congratulations!
You've completed Course 3: Quantitative Finance & Portfolio Theory!
What You've Built
A complete quantitative trading system with: - Multi-source data pipeline - Multi-strategy portfolio optimization - Real-time risk management - Cost-aware execution - Performance visualization
Key Skills Developed
- Portfolio Theory: Mean-variance, risk parity, factor models
- Risk Management: VaR, CVaR, stress testing, limits
- System Design: Component integration, state management
- Production Skills: Monitoring, execution, deployment
Next Steps
- Paper Trade: Test your system with simulated trading
- Iterate: Refine strategies based on performance
- Deploy: Move to cloud infrastructure (see Modules 17-18)
- Continue Learning: Course 4 (ML for Finance) awaits!
Good luck on your quantitative trading journey!